Table of Contents

πŸ‘‰ Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/chat_with_pdf

Chat with PDF in .NET Applications


🎯 Purpose of the Sample

Chat with PDF demonstrates how to use LM-Kit.NET to build an intelligent document question-answering system that combines vision-capable models with semantic retrieval to answer questions about PDF documents.

The sample shows how to:

  • Download and load vision models and embedding models with progress callbacks.
  • Create a PdfChat instance that handles document understanding automatically.
  • Configure document caching using the IVectorStore interface for persistent indexing.
  • Choose between text extraction (faster) and vision-based document understanding (better for complex layouts).
  • Load one or multiple PDF documents with automatic caching and indexing.
  • Ask natural language questions and receive grounded, context-aware responses.
  • Monitor document import progress, cache hits, and passage retrieval through events.

Why PDF Chat with LM-Kit.NET?

  • Local-first: all processing runs on your hardwareβ€”no cloud dependencies for sensitive documents.
  • Smart context management: small documents are included in full; large documents use intelligent passage retrieval.
  • Vision-powered: optionally use vision models to understand complex layouts, tables, and scanned pages.
  • Rich telemetry: track document processing, cache utilization, retrieval performance, and generation stats.
  • Multi-document: load multiple PDFs and query across all of them in a single conversation.
  • Flexible caching: use the built-in filesystem cache or implement custom storage backends.

πŸ‘₯ Target Audience

  • Enterprise Developers: build secure, on-premise document QA systems.
  • Legal & Compliance: analyze contracts, policies, and regulatory documents locally.
  • Research & Academia: query research papers, reports, and technical documentation.
  • Back-Office & RPA: automate document analysis workflows without cloud exposure.
  • Demo & Education: explore RAG (Retrieval-Augmented Generation) concepts in a practical C# example.

πŸš€ Problem Solved

  • Ask questions in natural language: query PDFs like you would ask a knowledgeable assistant.
  • Handle complex layouts: vision-based understanding interprets tables, multi-column text, and forms.
  • Manage large documents: automatic chunking and semantic retrieval for documents that exceed context limits.
  • Multi-document queries: load several PDFs and ask questions that span multiple sources.
  • Performance optimization: built-in caching via IVectorStore eliminates redundant processing on repeated loads.
  • Source transparency: see which passages were used to generate each answer.

πŸ’» Sample Application Description

Console app that:

  • Lets you choose a vision model for chat (or paste a custom model URI).
  • Automatically loads an embedding model for semantic search.
  • Offers two processing modes: standard text extraction or vision-based document understanding.
  • Downloads models if needed, with live progress updates.
  • Creates a PdfChat instance with filesystem-based caching via FileSystemVectorStore.
  • Prompts you to load one or more PDF documents.
  • Enters an interactive chat loop where you can:
    • Ask questions about your documents.
    • See which passages were retrieved and from which pages.
    • View generation statistics (tokens, speed, context usage).
  • Supports commands for managing documents and conversation state.
  • Loops until you press Enter on an empty prompt to quit.

✨ Key Features

  • πŸ“š Dual processing modes:
    • Standard extraction: fast text-based processing with OCR fallback.
    • Vision understanding: multimodal analysis for complex layouts using VlmOcr.
  • πŸ” Smart retrieval: automatically decides between full-context and passage retrieval based on document size.
  • πŸ’Ύ Pluggable caching: use FileSystemVectorStore or implement custom IVectorStore backends.
  • πŸ“Š Rich event system:
    • Document import progress (page-by-page processing).
    • Cache hit/miss notifications.
    • Passage retrieval with timing and source references.
    • Response generation status.
  • πŸ’¬ Multi-turn conversation: follow-up questions maintain context for natural dialogue.
  • πŸ”„ Conversation commands: reset, restart, add documents, regenerate, view status.
  • πŸ“ˆ Generation stats: tokens generated, speed, and context utilization per response.

🧰 Built-In Models (menu)

On startup, the sample shows a model selection menu:

Option Model Approx. VRAM Needed
0 MiniCPM 2.6 o 8.1B ~5.9 GB VRAM
1 Alibaba Qwen 3 2B (vision) ~2.5 GB VRAM
2 Alibaba Qwen 3 4B (vision) ~4 GB VRAM
3 Alibaba Qwen 3 8B (vision) ~6.5 GB VRAM
4 Google Gemma 3 4B (vision) ~5.7 GB VRAM
5 Google Gemma 3 12B (vision) ~11 GB VRAM
6 Mistral Ministral 3 3B (vision) ~3.5 GB VRAM
7 Mistral Ministral 3 8B (vision) ~6.5 GB VRAM
8 Mistral Ministral 3 14B (vision) ~12 GB VRAM
other Custom model URI depends on model

Any input other than 0-8 is treated as a custom model URI and passed directly to the LM constructor.

Additional models loaded automatically:

  • Embedding model: embeddinggemma-300m - used for semantic passage retrieval.
  • Vision OCR model (when using document understanding): lightonocr1025:1b - lightweight vision model for page analysis.

🧠 Supported Models

The sample is pre-wired to LM-Kit's predefined model cards:

Chat models:

  • minicpm-o
  • qwen3-vl:2b / qwen3-vl:4b / qwen3-vl:8b
  • gemma3:4b / gemma3:12b
  • ministral3:3b / ministral3:8b / ministral3:14b

Embedding model:

  • embeddinggemma-300m

Document understanding model:

  • lightonocr1025:1b

Internally:

// Chat model selection
modelUri = ModelCard
    .GetPredefinedModelCardByModelID("qwen3-vl:4b")
    .ModelUri;

// Embedding model (auto-loaded)
LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");

// Vision OCR for document understanding
var visionParser = new VlmOcr(LM.LoadFromModelID("lightonocr1025:1b"));

You can also provide any valid model URI manually (including local paths or custom model servers) by typing/pasting it when prompted.


πŸ› οΈ Commands & Flow

Startup Flow

  1. Model selection: choose a vision-capable chat model (0-8) or paste a custom URI.
  2. Model loading: chat and embedding models download (if needed) and load with progress reporting.
  3. Processing mode selection:
    • 0 - Standard text extraction (faster, uses Tesseract OCR as fallback).
    • 1 - Vision-based document understanding (better for complex layouts, loads additional vision model).
  4. Document loading: prompted to enter PDF path(s). Can load multiple documents.
  5. Chat loop: ask questions, receive answers with source references.

Interactive Commands

Inside the chat loop, type these commands instead of a question:

Command Description
/help Show all available commands with descriptions.
/status Display loaded documents, token usage, and configuration details.
/add Add more PDF documents to the current collection (clears chat history).
/restart Clear chat history but keep all loaded documents.
/reset Remove all documents and clear chat history (prompts for new documents).
/regenerate Generate a new response to your last question.
(empty) Press Enter on empty prompt to exit the application.

Per-Question Flow

  1. Type your question and press Enter.
  2. Passage retrieval (if using chunked documents): relevant passages are retrieved with timing info.
  3. Response generation: the model generates an answer, streamed to console in real-time.
  4. Stats display: tokens generated, generation speed, and context utilization.
  5. Repeat or use a command.

πŸ—£οΈ Example Use Cases

Try the sample with:

  • A contract or legal agreement β†’ ask "What are the termination conditions?" or "Summarize the payment terms."
  • A research paper β†’ ask "What methodology did the authors use?" or "What were the main findings?"
  • A technical manual β†’ ask "How do I configure the network settings?" or "What are the system requirements?"
  • A financial report β†’ ask "What was the revenue growth year-over-year?" or "Summarize the risk factors."
  • Multiple related documents β†’ load several PDFs and ask "Compare the approaches described in these papers."

After each response, inspect:

  • Retrieved passages: do they match the most relevant sections?
  • Source references: correct document and page numbers?
  • Generation stats: acceptable latency for your use case?

πŸ“Š Document Processing Modes

Standard Text Extraction

Processing Mode: 0 - Standard text extraction (faster)
  • Extracts text directly from PDF structure.
  • Uses Tesseract OCR as fallback for scanned/image-based pages.
  • Best for: clean PDFs with simple layouts, text-heavy documents.

Vision-Based Document Understanding

Processing Mode: 1 - Vision-based document understanding (better for complex layouts)
  • Loads an additional vision model (lightonocr1025:1b).
  • Analyzes pages visually to understand layout, structure, and relationships.
  • Best for: multi-column layouts, tables, forms, scanned documents, mixed content.

πŸ“¦ Document Indexing Modes

When you load a document, PdfChat automatically decides how to process it:

Full Document Mode

  • When: document fits within FullDocumentTokenBudget (default: 4096 tokens).
  • Behavior: entire document is included in every query context.
  • Advantage: model has complete context, best for small documents.
  • Trade-off: uses more context space, limiting room for conversation history.

Passage Retrieval Mode

  • When: document exceeds the token budget.
  • Behavior: document is chunked and indexed for semantic search.
  • Advantage: handles documents of any size efficiently.
  • Trade-off: only relevant passages are included (may miss tangentially related content).

The sample reports which mode was used after loading:

βœ“ Loaded: report.pdf
    Pages: 45
    Tokens: 28,500
    Mode: passage retrieval
    ⚠ Exceeded token budget β†’ using passage retrieval

πŸ’Ύ Document Caching with IVectorStore

PdfChat supports persistent caching of processed documents through the IVectorStore interface. This eliminates redundant processing when the same document is loaded across sessions.

FileSystemVectorStore (Built-in)

The sample uses FileSystemVectorStore for local filesystem caching:

using LMKit.Data.Storage;

string cacheDirectory = Path.Combine(
    Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
    "LMKit", "ChatWithPDF", "Cache");

IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);

// Pass to PdfChat constructor
PdfChat chat = new PdfChat(chatModel, embeddingModel, vectorStore);

Cache Behavior

  • On cache hit: document loads instantly from pre-indexed data.
  • On cache miss: full processing (page extraction, embedding generation) occurs, then results are cached.

Monitor cache access via the CacheAccessed event:

chat.CacheAccessed += (sender, e) =>
{
    if (e.IsHit)
        Console.WriteLine($"Cache hit: {e.DocumentName} loaded instantly");
    else
        Console.WriteLine($"Cache miss: {e.DocumentName} will be processed");
};

Disabling Caching

To disable caching, omit the vectorStore parameter:

// No caching - documents processed fresh each time
PdfChat chat = new PdfChat(chatModel, embeddingModel);

Custom Vector Store Implementations

Implement IVectorStore for custom backends (databases, cloud storage, etc.):

public class CustomVectorStore : IVectorStore
{
    // Implement interface methods for your storage backend
    public Task<bool> CollectionExistsAsync(string collectionId, CancellationToken ct) { ... }
    // ... other methods
}

πŸ“‹ Document Metadata

When loading documents, you can attach rich metadata for source tracking and attribution:

DocumentMetadata Class

var metadata = new DocumentMetadata("Q4 Financial Report")
{
    SourceUri = "https://intranet.example.com/docs/q4-report.pdf",
    AdditionalMetadata = new MetadataCollection
    {
        { "author", "Finance Team" },
        { "department", "Corporate Finance" },
        { "confidentiality", "Internal" },
        { "fiscal_year", "2024" }
    }
};

var result = await chat.LoadDocumentAsync("report.pdf", metadata);

Accessing Metadata in Query Results

Source references include the metadata you specified:

var response = await chat.SubmitAsync("What was the Q4 revenue?");

foreach (var source in response.SourceReferences)
{
    Console.WriteLine($"Source: {source.Name}, Page {source.PageNumber}");
    
    if (source.Metadata?.AdditionalMetadata != null)
    {
        if (source.Metadata.AdditionalMetadata.TryGet("author", out var author))
            Console.WriteLine($"  Author: {author.Value}");
        if (source.Metadata.AdditionalMetadata.TryGet("department", out var dept))
            Console.WriteLine($"  Department: {dept.Value}");
    }
}

Default Metadata

If no metadata is provided, PdfChat creates default metadata using the file name:

// Equivalent to: new DocumentMetadata(Path.GetFileName(filePath))
var result = chat.LoadDocument("report.pdf");

πŸ”§ Advanced Configuration

Reranking for Improved Retrieval

Use a cross-encoder reranker to improve passage retrieval accuracy:

using LMKit.Retrieval;

// Load a reranker model
var rerankerModel = LM.LoadFromModelID("reranker-model-id");

// Configure reranking
chat.Reranker = new RagEngine.RagReranker(rerankerModel, rerankAlpha: 0.7f);

The RerankAlpha parameter (0.0–1.0) controls blending between raw embedding similarity and rerank score:

  • 0.0: use only embedding similarity
  • 1.0: use only rerank score
  • 0.7 (recommended): blend favoring rerank score

Visual Grounding with Page Renderings

Include page images alongside retrieved passages for visual context:

// Enable page renderings in retrieval context
chat.IncludePageRenderingsInContext = true;

When enabled:

  • Page images corresponding to retrieved passages are injected into the context.
  • Allows the model to visually interpret tables, charts, and figures.
  • Increases token consumption proportionally to unique pages referenced.
  • Requires a vision-capable chat model for best results.

Document Processing Modality

Control whether pages are processed as images or text-only:

// Multimodal: pages may be processed as images (default)
chat.DocumentProcessingModality = InferenceModality.Multimodal;

// Text-only: use extracted text only
chat.DocumentProcessingModality = InferenceModality.Text;

Note: This property must be set before loading documents.

Retrieval Parameters

Fine-tune passage retrieval behavior:

// Maximum passages retrieved per query (default: 5)
chat.MaxRetrievedPassages = 10;

// Minimum relevance score threshold (default: 0.5)
// Higher = fewer but more precise matches
chat.MinRelevanceScore = 0.6f;

// Token budget for full-document mode (default: 4096)
// Documents exceeding this use passage retrieval
chat.FullDocumentTokenBudget = 8192;

// Prefer including small documents in full (default: true)
chat.PreferFullDocumentContext = true;

Reasoning and Sampling

Configure response generation behavior:

// Enable extended reasoning for complex questions
chat.ReasoningLevel = ReasoningLevel.Extended;

// Use temperature-based sampling for varied responses
chat.SamplingMode = new RandomSampling { Temperature = 0.7f };

// Or stick with deterministic outputs (default)
chat.SamplingMode = new GreedyDecoding();

// Maximum tokens per response (default: 2048)
chat.MaximumCompletionTokens = 4096;

βš™οΈ Behavior & Policies (quick reference)

  • Model selection: exactly one chat model per process. To change models, restart the app.
  • Download & load:
    • ModelDownloadingProgress prints Downloading model XX.XX% or byte counts.
    • ModelLoadingProgress prints Loading model XX% and clears the console once done.
  • Document caching:
    • Uses IVectorStore interface (sample uses FileSystemVectorStore).
    • Default cache location: %LocalAppData%/LMKit/ChatWithPDF/Cache
    • On cache hit: instant load from pre-indexed data.
    • On cache miss: full processing (page extraction, embedding generation).
  • Passage retrieval:
    • Default MaxRetrievedPassages: 5
    • Default MinRelevanceScore: 0.5
    • Results sorted by document, then page number, then partition index.
  • Response generation:
    • Streaming output via AfterTextCompletion event.
    • Internal reasoning shown in dark blue.
    • Tool invocations shown in dark yellow.
  • Licensing:
    • You can set an optional license key via LicenseManager.SetLicenseKey("").
    • A free community license is available from the LM-Kit website.

πŸ’» Minimal Integration Snippet

using LMKit.Data.Storage;
using LMKit.Extraction.Ocr;
using LMKit.Model;
using LMKit.Retrieval;

public class PdfChatSample
{
    public async Task RunChat(string modelUri, string pdfPath)
    {
        // Load models
        var chatModel = new LM(new Uri(modelUri));
        var embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");

        // Configure caching with vector store
        string cacheDirectory = Path.Combine(
            Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
            "LMKit", "PdfChat", "Cache");
        IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);

        // Create PdfChat instance with caching
        using var chat = new PdfChat(chatModel, embeddingModel, vectorStore)
        {
            PreferFullDocumentContext = true
        };

        // Optional: enable vision-based document understanding
        chat.PageProcessingMode = PageProcessingMode.DocumentUnderstanding;
        chat.DocumentVisionParser = new VlmOcr(LM.LoadFromModelID("lightonocr1025:1b"));

        // Optional: include page images in retrieval context
        // chat.IncludePageRenderingsInContext = true;

        // Subscribe to events
        chat.AfterTextCompletion += (s, e) => Console.Write(e.Text);
        chat.PassageRetrievalCompleted += (s, e) =>
            Console.WriteLine($"Retrieved {e.RetrievedCount} passages in {e.Elapsed.TotalMilliseconds:F0}ms");

        // Load document with optional metadata
        var metadata = new DocumentMetadata(Path.GetFileName(pdfPath))
        {
            SourceUri = pdfPath
        };
        var result = await chat.LoadDocumentAsync(pdfPath, metadata);
        Console.WriteLine($"Loaded: {result.Name} ({result.IndexingMode})");

        // Ask questions
        var response = await chat.SubmitAsync("What are the key points in this document?");
        
        Console.WriteLine();
        Console.WriteLine($"Tokens: {response.Response.GeneratedTokens.Count}");
        Console.WriteLine($"Speed: {response.Response.TokenGenerationRate:F1} tok/s");
    }
}

Use this pattern to integrate PDF chat into web APIs, desktop apps, or document processing pipelines.


πŸ› οΈ Getting Started

πŸ“‹ Prerequisites

  • .NET Framework 4.6.2 or .NET 8.0+

πŸ“₯ Download

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/chat_with_pdf

Project Link: chat_with_pdf (same path as above)

▢️ Run

dotnet build
dotnet run

Then:

  1. Select a chat model by typing 0-8, or paste a custom model URI.
  2. Wait for models to download (first run) and load.
  3. Select processing mode: 0 for standard extraction, 1 for vision understanding.
  4. Enter the path to a PDF file when prompted.
  5. Optionally load additional documents (answer y to "Load another document?").
  6. Start asking questions about your documents.
  7. Use commands (/help, /status, /add, etc.) as needed.
  8. Press Enter on an empty prompt to exit.

πŸ” Notes on Key Types

Core Classes

  • PdfChat (LMKit.Retrieval) - main class for PDF question-answering:

    • Manages document loading, indexing, and conversation state.
    • Automatically chooses between full-context and passage retrieval.
    • Supports multi-turn conversation with context preservation.
    • Implements IMultiTurnConversation and IDisposable.
  • IVectorStore (LMKit.Data.Storage) - interface for embedding storage:

    • FileSystemVectorStore: built-in filesystem-based implementation.
    • Enables persistent caching of document embeddings across sessions.
    • Implement custom backends for databases or cloud storage.
  • DocumentMetadata - rich metadata for source tracking:

    • Name: display name for the document.
    • SourceUri: original location or reference URL.
    • AdditionalMetadata: custom key-value pairs for attribution.

Result Types

  • DocumentIndexingResult: returned when loading a document:

    • Name: the document name (from metadata or file name).
    • IndexingMode: FullDocument or PassageRetrieval.
    • PageCount: total pages in the document.
    • TokenCount: estimated tokens for the document.
    • ExceededTokenBudget: whether it exceeded the full-context limit.
  • DocumentQueryResult: returned when asking a question:

    • Response: the TextGenerationResult with completion and stats.
    • SourceReferences: list of DocumentReference objects (document name, page number, metadata).
    • HasSourceReferences: whether passages were used (vs. full context).

Processing Classes

  • VlmOcr (LMKit.Extraction.Ocr) - vision-based document parser:

    • Analyzes page images to understand structure and content.
    • Set via chat.DocumentVisionParser property.
  • TesseractOcr (LMKit.Integrations.Tesseract) - OCR fallback engine:

    • Extracts text from image-based pages.
    • Set via chat.OcrEngine property.
  • RagEngine.RagReranker (LMKit.Retrieval) - cross-encoder reranker:

    • Improves retrieval accuracy with learned relevance scoring.
    • Set via chat.Reranker property.

Key Events

Event Description
DocumentImportProgress Page processing and embedding generation progress.
CacheAccessed Cache hit/miss notifications with document identifier.
PassageRetrievalCompleted Retrieval results with timing and source references.
ResponseGenerationStarted Signals when generation begins, includes context mode.
AfterTextCompletion Streaming text output during response generation.

⚠️ Troubleshooting

  • "No documents have been loaded"

    • You must load at least one PDF before asking questions.
    • Check the file path and try again.
  • "File not found"

    • Verify the PDF path is correct and the file exists.
    • Try using absolute paths or quoting paths with spaces.
  • Slow document loading

    • First load processes and indexes the document (expected).
    • Subsequent loads use cache and should be near-instant.
    • Try a smaller model or standard extraction mode for faster processing.
  • Out-of-memory or driver errors

    • VRAM insufficient for selected model(s).
    • Pick smaller models (e.g., Qwen 3 2B, Ministral 3 3B).
    • Use standard extraction instead of vision understanding.
  • Poor answer quality

    • Try a larger chat model for better reasoning.
    • Use vision-based document understanding for complex layouts.
    • Increase MaxRetrievedPassages for more context.
    • Lower MinRelevanceScore to include more passages.
    • Enable IncludePageRenderingsInContext for visual grounding.
    • Add a Reranker for improved passage selection.
  • Missing passages in answers

    • Document may be using full-context mode (no passage retrieval).
    • Check with /status command to see document modes.
    • For large documents, relevant content may not be in top-k passages.
  • Cache issues

    • Delete the cache directory to force reprocessing.
    • Default location: %LocalAppData%/LMKit/ChatWithPDF/Cache
    • Verify IVectorStore is configured correctly.
  • "Cannot modify this setting after documents have been loaded"

    • Some properties (DocumentProcessingModality) must be set before loading documents.
    • Call ClearDocuments() first, then reconfigure.
  • "DocumentVisionParser must be set when PageProcessingMode is DocumentUnderstanding"

    • Assign a VlmOcr instance before loading documents when using vision mode.

πŸ”§ Extend the Demo

  • Web API integration: expose PdfChat as a REST endpoint for document QA services.
  • Batch processing: process entire document libraries and build searchable knowledge bases.
  • Custom system prompts: tailor the assistant's behavior for specific domains (legal, medical, technical).
  • Custom vector stores: implement IVectorStore for database-backed caching (PostgreSQL, Redis, etc.).
  • Advanced retrieval:
    • Adjust MaxRetrievedPassages and MinRelevanceScore for your use case.
    • Add a Reranker for cross-encoder relevance scoring.
    • Enable IncludePageRenderingsInContext for visual grounding.
    • Implement custom reranking logic using the PassageRetrievalCompleted event.
  • Conversation export: save chat history for audit trails or further analysis.
  • Multi-modal responses: combine with image generation or charting for visual answers.
  • Integration with other LM-Kit features:
    • Chain with Text Analysis for entity extraction from answers.
    • Use Structured Extraction to pull specific data fields.
    • Connect to Function Calling for automated document workflows.

πŸ“š Additional Resources