Table of Contents

👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/chat_with_pdf

Chat with PDF in .NET Applications


🎯 Purpose of the Sample

Chat with PDF demonstrates how to use LM-Kit.NET to build an intelligent document question-answering system that combines vision-capable models with semantic retrieval to answer questions about PDF documents.

The sample shows how to:

  • Download and load vision models and embedding models with progress callbacks.
  • Create a PdfChat instance that handles document understanding automatically.
  • Configure document caching using the IVectorStore interface for persistent indexing.
  • Choose between text extraction (faster) and vision-based document understanding (better for complex layouts).
  • Load one or multiple PDF documents with automatic caching and indexing.
  • Ask natural language questions and receive grounded, context-aware responses.
  • Monitor document import progress, cache hits, and passage retrieval through events.

Why PDF Chat with LM-Kit.NET?

  • Local-first: all processing runs on your hardware-no cloud dependencies for sensitive documents.
  • Smart context management: small documents are included in full; large documents use intelligent passage retrieval.
  • Vision-powered: optionally use vision models to understand complex layouts, tables, and scanned pages.
  • Rich telemetry: track document processing, cache utilization, retrieval performance, and generation stats.
  • Multi-document: load multiple PDFs and query across all of them in a single conversation.
  • Flexible caching: use the built-in filesystem cache or implement custom storage backends.

👥 Target Audience

  • Enterprise Developers: build secure, on-premise document QA systems.
  • Legal & Compliance: analyze contracts, policies, and regulatory documents locally.
  • Research & Academia: query research papers, reports, and technical documentation.
  • Back-Office & RPA: automate document analysis workflows without cloud exposure.
  • Demo & Education: explore RAG (Retrieval-Augmented Generation) concepts in a practical C# example.

🚀 Problem Solved

  • Ask questions in natural language: query PDFs like you would ask a knowledgeable assistant.
  • Handle complex layouts: vision-based understanding interprets tables, multi-column text, and forms.
  • Manage large documents: automatic chunking and semantic retrieval for documents that exceed context limits.
  • Multi-document queries: load several PDFs and ask questions that span multiple sources.
  • Performance optimization: built-in caching via IVectorStore eliminates redundant processing on repeated loads.
  • Source transparency: see which passages were used to generate each answer.

💻 Sample Application Description

Console app that:

  • Lets you choose a vision model for chat (or paste a custom model URI).
  • Automatically loads an embedding model for semantic search.
  • Offers two processing modes: standard text extraction or vision-based document understanding.
  • Downloads models if needed, with live progress updates.
  • Creates a PdfChat instance with filesystem-based caching via FileSystemVectorStore.
  • Prompts you to load one or more PDF documents.
  • Enters an interactive chat loop where you can:
    • Ask questions about your documents.
    • See which passages were retrieved and from which pages.
    • View generation statistics (tokens, speed, context usage).
  • Supports commands for managing documents and conversation state.
  • Loops until you press Enter on an empty prompt to quit.

✨ Key Features

  • 📚 Dual processing modes:
    • Standard extraction: fast text-based processing with OCR fallback.
    • Vision understanding: multimodal analysis for complex layouts using VlmOcr.
  • 🔍 Smart retrieval: automatically decides between full-context and passage retrieval based on document size.
  • 💾 Pluggable caching: use FileSystemVectorStore or implement custom IVectorStore backends.
  • 📊 Rich event system:
    • Document import progress (page-by-page processing).
    • Cache hit/miss notifications.
    • Passage retrieval with timing and source references.
    • Response generation status.
  • 💬 Multi-turn conversation: follow-up questions maintain context for natural dialogue.
  • 🔄 Conversation commands: reset, restart, add documents, regenerate, view status.
  • 📈 Generation stats: tokens generated, speed, and context utilization per response.

🧰 Built-In Models (menu)

On startup, the sample shows a model selection menu:

Option Model Approx. VRAM Needed
0 MiniCPM 2.6 o 8.1B ~5.9 GB VRAM
1 Alibaba Qwen 3 2B (vision) ~2.5 GB VRAM
2 Alibaba Qwen 3 4B (vision) ~4 GB VRAM
3 Alibaba Qwen 3 8B (vision) ~6.5 GB VRAM
4 Google Gemma 3 4B (vision) ~5.7 GB VRAM
5 Google Gemma 3 12B (vision) ~11 GB VRAM
6 Mistral Ministral 3 3B (vision) ~3.5 GB VRAM
7 Mistral Ministral 3 8B (vision) ~6.5 GB VRAM
8 Mistral Ministral 3 14B (vision) ~12 GB VRAM
other Custom model URI depends on model

Any input other than 0-8 is treated as a custom model URI and passed directly to the LM constructor.

Additional models loaded automatically:

  • Embedding model: embeddinggemma-300m - used for semantic passage retrieval.
  • Vision OCR model (when using document understanding): lightonocr1025:1b - lightweight vision model for page analysis.

🧠 Supported Models

The sample is pre-wired to LM-Kit's predefined model cards:

Chat models:

  • minicpm-o
  • qwen3-vl:2b / qwen3-vl:4b / qwen3-vl:8b
  • gemma3:4b / gemma3:12b
  • ministral3:3b / ministral3:8b / ministral3:14b

Embedding model:

  • embeddinggemma-300m

Document understanding model:

  • lightonocr1025:1b

Internally:

// Chat model selection
modelUri = ModelCard
    .GetPredefinedModelCardByModelID("qwen3-vl:4b")
    .ModelUri;

// Embedding model (auto-loaded)
LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");

// Vision OCR for document understanding
var visionParser = new VlmOcr(LM.LoadFromModelID("lightonocr1025:1b"));

You can also provide any valid model URI manually (including local paths or custom model servers) by typing/pasting it when prompted.


🛠️ Commands & Flow

Startup Flow

  1. Model selection: choose a vision-capable chat model (0-8) or paste a custom URI.
  2. Model loading: chat and embedding models download (if needed) and load with progress reporting.
  3. Processing mode selection:
    • 0 - Standard text extraction (faster, uses Tesseract OCR as fallback).
    • 1 - Vision-based document understanding (better for complex layouts, loads additional vision model).
  4. Document loading: prompted to enter PDF path(s). Can load multiple documents.
  5. Chat loop: ask questions, receive answers with source references.

Interactive Commands

Inside the chat loop, type these commands instead of a question:

Command Description
/help Show all available commands with descriptions.
/status Display loaded documents, token usage, and configuration details.
/add Add more PDF documents to the current collection (clears chat history).
/restart Clear chat history but keep all loaded documents.
/reset Remove all documents and clear chat history (prompts for new documents).
/regenerate Generate a new response to your last question.
(empty) Press Enter on empty prompt to exit the application.

Per-Question Flow

  1. Type your question and press Enter.
  2. Passage retrieval (if using chunked documents): relevant passages are retrieved with timing info.
  3. Response generation: the model generates an answer, streamed to console in real-time.
  4. Stats display: tokens generated, generation speed, and context utilization.
  5. Repeat or use a command.

🗣️ Example Use Cases

Try the sample with:

  • A contract or legal agreement → ask "What are the termination conditions?" or "Summarize the payment terms."
  • A research paper → ask "What methodology did the authors use?" or "What were the main findings?"
  • A technical manual → ask "How do I configure the network settings?" or "What are the system requirements?"
  • A financial report → ask "What was the revenue growth year-over-year?" or "Summarize the risk factors."
  • Multiple related documents → load several PDFs and ask "Compare the approaches described in these papers."

After each response, inspect:

  • Retrieved passages: do they match the most relevant sections?
  • Source references: correct document and page numbers?
  • Generation stats: acceptable latency for your use case?

📊 Document Processing Modes

Standard Text Extraction

Processing Mode: 0 - Standard text extraction (faster)
  • Extracts text directly from PDF structure.
  • Uses Tesseract OCR as fallback for scanned/image-based pages.
  • Best for: clean PDFs with simple layouts, text-heavy documents.

Vision-Based Document Understanding

Processing Mode: 1 - Vision-based document understanding (better for complex layouts)
  • Loads an additional vision model (lightonocr1025:1b).
  • Analyzes pages visually to understand layout, structure, and relationships.
  • Best for: multi-column layouts, tables, forms, scanned documents, mixed content.

📦 Document Indexing Modes

When you load a document, PdfChat automatically decides how to process it:

Full Document Mode

  • When: document fits within FullDocumentTokenBudget (default: 4096 tokens).
  • Behavior: entire document is included in every query context.
  • Advantage: model has complete context, best for small documents.
  • Trade-off: uses more context space, limiting room for conversation history.

Passage Retrieval Mode

  • When: document exceeds the token budget.
  • Behavior: document is chunked and indexed for semantic search.
  • Advantage: handles documents of any size efficiently.
  • Trade-off: only relevant passages are included (may miss tangentially related content).

The sample reports which mode was used after loading:

✓ Loaded: report.pdf
    Pages: 45
    Tokens: 28,500
    Mode: passage retrieval
    ⚠ Exceeded token budget → using passage retrieval

💾 Document Caching with IVectorStore

PdfChat supports persistent caching of processed documents through the IVectorStore interface. This eliminates redundant processing when the same document is loaded across sessions.

FileSystemVectorStore (Built-in)

The sample uses FileSystemVectorStore for local filesystem caching:

using LMKit.Data.Storage;

string cacheDirectory = Path.Combine(
    Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
    "LMKit", "ChatWithPDF", "Cache");

IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);

// Pass to PdfChat constructor
PdfChat chat = new PdfChat(chatModel, embeddingModel, vectorStore);

Cache Behavior

  • On cache hit: document loads instantly from pre-indexed data.
  • On cache miss: full processing (page extraction, embedding generation) occurs, then results are cached.

Monitor cache access via the CacheAccessed event:

chat.CacheAccessed += (sender, e) =>
{
    if (e.IsHit)
        Console.WriteLine($"Cache hit: {e.DocumentName} loaded instantly");
    else
        Console.WriteLine($"Cache miss: {e.DocumentName} will be processed");
};

Disabling Caching

To disable caching, omit the vectorStore parameter:

// No caching - documents processed fresh each time
PdfChat chat = new PdfChat(chatModel, embeddingModel);

Custom Vector Store Implementations

Implement IVectorStore for custom backends (databases, cloud storage, etc.):

public class CustomVectorStore : IVectorStore
{
    // Implement interface methods for your storage backend
    public Task<bool> CollectionExistsAsync(string collectionId, CancellationToken ct) { ... }
    // ... other methods
}

📋 Document Metadata

When loading documents, you can attach rich metadata for source tracking and attribution:

DocumentMetadata Class

var metadata = new DocumentMetadata("Q4 Financial Report")
{
    SourceUri = "https://intranet.example.com/docs/q4-report.pdf",
    AdditionalMetadata = new MetadataCollection
    {
        { "author", "Finance Team" },
        { "department", "Corporate Finance" },
        { "confidentiality", "Internal" },
        { "fiscal_year", "2024" }
    }
};

var result = await chat.LoadDocumentAsync("report.pdf", metadata);

Accessing Metadata in Query Results

Source references include the metadata you specified:

var response = await chat.SubmitAsync("What was the Q4 revenue?");

foreach (var source in response.SourceReferences)
{
    Console.WriteLine($"Source: {source.Name}, Page {source.PageNumber}");
    
    if (source.Metadata?.AdditionalMetadata != null)
    {
        if (source.Metadata.AdditionalMetadata.TryGet("author", out var author))
            Console.WriteLine($"  Author: {author.Value}");
        if (source.Metadata.AdditionalMetadata.TryGet("department", out var dept))
            Console.WriteLine($"  Department: {dept.Value}");
    }
}

Default Metadata

If no metadata is provided, PdfChat creates default metadata using the file name:

// Equivalent to: new DocumentMetadata(Path.GetFileName(filePath))
var result = chat.LoadDocument("report.pdf");

🔧 Advanced Configuration

Reranking for Improved Retrieval

Use a cross-encoder reranker to improve passage retrieval accuracy:

using LMKit.Retrieval;

// Load a reranker model
var rerankerModel = LM.LoadFromModelID("reranker-model-id");

// Configure reranking
chat.Reranker = new RagEngine.RagReranker(rerankerModel, rerankAlpha: 0.7f);

The RerankAlpha parameter (0.0–1.0) controls blending between raw embedding similarity and rerank score:

  • 0.0: use only embedding similarity
  • 1.0: use only rerank score
  • 0.7 (recommended): blend favoring rerank score

Visual Grounding with Page Renderings

Include page images alongside retrieved passages for visual context:

// Enable page renderings in retrieval context
chat.IncludePageRenderingsInContext = true;

When enabled:

  • Page images corresponding to retrieved passages are injected into the context.
  • Allows the model to visually interpret tables, charts, and figures.
  • Increases token consumption proportionally to unique pages referenced.
  • Requires a vision-capable chat model for best results.

Document Processing Modality

Control whether pages are processed as images or text-only:

// Multimodal: pages may be processed as images (default)
chat.DocumentProcessingModality = InferenceModality.Multimodal;

// Text-only: use extracted text only
chat.DocumentProcessingModality = InferenceModality.Text;

Note: This property must be set before loading documents.

Retrieval Parameters

Fine-tune passage retrieval behavior:

// Maximum passages retrieved per query (default: 5)
chat.MaxRetrievedPassages = 10;

// Minimum relevance score threshold (default: 0.5)
// Higher = fewer but more precise matches
chat.MinRelevanceScore = 0.6f;

// Token budget for full-document mode (default: 4096)
// Documents exceeding this use passage retrieval
chat.FullDocumentTokenBudget = 8192;

// Prefer including small documents in full (default: true)
chat.PreferFullDocumentContext = true;

Reasoning and Sampling

Configure response generation behavior:

// Enable extended reasoning for complex questions
chat.ReasoningLevel = ReasoningLevel.Extended;

// Use temperature-based sampling for varied responses
chat.SamplingMode = new RandomSampling { Temperature = 0.7f };

// Or stick with deterministic outputs (default)
chat.SamplingMode = new GreedyDecoding();

// Maximum tokens per response (default: 2048)
chat.MaximumCompletionTokens = 4096;

⚙️ Behavior & Policies (quick reference)

  • Model selection: exactly one chat model per process. To change models, restart the app.
  • Download & load:
    • ModelDownloadingProgress prints Downloading model XX.XX% or byte counts.
    • ModelLoadingProgress prints Loading model XX% and clears the console once done.
  • Document caching:
    • Uses IVectorStore interface (sample uses FileSystemVectorStore).
    • Default cache location: %LocalAppData%/LMKit/ChatWithPDF/Cache
    • On cache hit: instant load from pre-indexed data.
    • On cache miss: full processing (page extraction, embedding generation).
  • Passage retrieval:
    • Default MaxRetrievedPassages: 5
    • Default MinRelevanceScore: 0.5
    • Results sorted by document, then page number, then partition index.
  • Response generation:
    • Streaming output via AfterTextCompletion event.
    • Internal reasoning shown in dark blue.
    • Tool invocations shown in dark yellow.
  • Licensing:
    • You can set an optional license key via LicenseManager.SetLicenseKey("").
    • A free community license is available from the LM-Kit website.

💻 Minimal Integration Snippet

using LMKit.Data.Storage;
using LMKit.Extraction.Ocr;
using LMKit.Model;
using LMKit.Retrieval;

public class PdfChatSample
{
    public async Task RunChat(string modelUri, string pdfPath)
    {
        // Load models
        var chatModel = new LM(new Uri(modelUri));
        var embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");

        // Configure caching with vector store
        string cacheDirectory = Path.Combine(
            Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
            "LMKit", "PdfChat", "Cache");
        IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);

        // Create PdfChat instance with caching
        using var chat = new PdfChat(chatModel, embeddingModel, vectorStore)
        {
            PreferFullDocumentContext = true
        };

        // Optional: enable vision-based document understanding
        chat.PageProcessingMode = PageProcessingMode.DocumentUnderstanding;
        chat.DocumentVisionParser = new VlmOcr(LM.LoadFromModelID("lightonocr1025:1b"));

        // Optional: include page images in retrieval context
        // chat.IncludePageRenderingsInContext = true;

        // Subscribe to events
        chat.AfterTextCompletion += (s, e) => Console.Write(e.Text);
        chat.PassageRetrievalCompleted += (s, e) =>
            Console.WriteLine($"Retrieved {e.RetrievedCount} passages in {e.Elapsed.TotalMilliseconds:F0}ms");

        // Load document with optional metadata
        var metadata = new DocumentMetadata(Path.GetFileName(pdfPath))
        {
            SourceUri = pdfPath
        };
        var result = await chat.LoadDocumentAsync(pdfPath, metadata);
        Console.WriteLine($"Loaded: {result.Name} ({result.IndexingMode})");

        // Ask questions
        var response = await chat.SubmitAsync("What are the key points in this document?");
        
        Console.WriteLine();
        Console.WriteLine($"Tokens: {response.Response.GeneratedTokens.Count}");
        Console.WriteLine($"Speed: {response.Response.TokenGenerationRate:F1} tok/s");
    }
}

Use this pattern to integrate PDF chat into web APIs, desktop apps, or document processing pipelines.


🛠️ Getting Started

📋 Prerequisites

  • .NET Framework 4.6.2 or .NET 8.0+

📥 Download

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/chat_with_pdf

Project Link: chat_with_pdf (same path as above)

▶️ Run

dotnet build
dotnet run

Then:

  1. Select a chat model by typing 0-8, or paste a custom model URI.
  2. Wait for models to download (first run) and load.
  3. Select processing mode: 0 for standard extraction, 1 for vision understanding.
  4. Enter the path to a PDF file when prompted.
  5. Optionally load additional documents (answer y to "Load another document?").
  6. Start asking questions about your documents.
  7. Use commands (/help, /status, /add, etc.) as needed.
  8. Press Enter on an empty prompt to exit.

🔍 Notes on Key Types

Core Classes

  • PdfChat (LMKit.Retrieval) - main class for PDF question-answering:

    • Manages document loading, indexing, and conversation state.
    • Automatically chooses between full-context and passage retrieval.
    • Supports multi-turn conversation with context preservation.
    • Implements IMultiTurnConversation and IDisposable.
  • IVectorStore (LMKit.Data.Storage) - interface for embedding storage:

    • FileSystemVectorStore: built-in filesystem-based implementation.
    • Enables persistent caching of document embeddings across sessions.
    • Implement custom backends for databases or cloud storage.
  • DocumentMetadata - rich metadata for source tracking:

    • Name: display name for the document.
    • SourceUri: original location or reference URL.
    • AdditionalMetadata: custom key-value pairs for attribution.

Result Types

  • DocumentIndexingResult: returned when loading a document:

    • Name: the document name (from metadata or file name).
    • IndexingMode: FullDocument or PassageRetrieval.
    • PageCount: total pages in the document.
    • TokenCount: estimated tokens for the document.
    • ExceededTokenBudget: whether it exceeded the full-context limit.
  • DocumentQueryResult: returned when asking a question:

    • Response: the TextGenerationResult with completion and stats.
    • SourceReferences: list of DocumentReference objects (document name, page number, metadata).
    • HasSourceReferences: whether passages were used (vs. full context).

Processing Classes

  • VlmOcr (LMKit.Extraction.Ocr) - vision-based document parser:

    • Analyzes page images to understand structure and content.
    • Set via chat.DocumentVisionParser property.
  • TesseractOcr (LMKit.Integrations.Tesseract) - OCR fallback engine:

    • Extracts text from image-based pages.
    • Set via chat.OcrEngine property.
  • RagEngine.RagReranker (LMKit.Retrieval) - cross-encoder reranker:

    • Improves retrieval accuracy with learned relevance scoring.
    • Set via chat.Reranker property.

Key Events

Event Description
DocumentImportProgress Page processing and embedding generation progress.
CacheAccessed Cache hit/miss notifications with document identifier.
PassageRetrievalCompleted Retrieval results with timing and source references.
ResponseGenerationStarted Signals when generation begins, includes context mode.
AfterTextCompletion Streaming text output during response generation.

⚠️ Troubleshooting

  • "No documents have been loaded"

    • You must load at least one PDF before asking questions.
    • Check the file path and try again.
  • "File not found"

    • Verify the PDF path is correct and the file exists.
    • Try using absolute paths or quoting paths with spaces.
  • Slow document loading

    • First load processes and indexes the document (expected).
    • Subsequent loads use cache and should be near-instant.
    • Try a smaller model or standard extraction mode for faster processing.
  • Out-of-memory or driver errors

    • VRAM insufficient for selected model(s).
    • Pick smaller models (e.g., Qwen 3 2B, Ministral 3 3B).
    • Use standard extraction instead of vision understanding.
  • Poor answer quality

    • Try a larger chat model for better reasoning.
    • Use vision-based document understanding for complex layouts.
    • Increase MaxRetrievedPassages for more context.
    • Lower MinRelevanceScore to include more passages.
    • Enable IncludePageRenderingsInContext for visual grounding.
    • Add a Reranker for improved passage selection.
  • Missing passages in answers

    • Document may be using full-context mode (no passage retrieval).
    • Check with /status command to see document modes.
    • For large documents, relevant content may not be in top-k passages.
  • Cache issues

    • Delete the cache directory to force reprocessing.
    • Default location: %LocalAppData%/LMKit/ChatWithPDF/Cache
    • Verify IVectorStore is configured correctly.
  • "Cannot modify this setting after documents have been loaded"

    • Some properties (DocumentProcessingModality) must be set before loading documents.
    • Call ClearDocuments() first, then reconfigure.
  • "DocumentVisionParser must be set when PageProcessingMode is DocumentUnderstanding"

    • Assign a VlmOcr instance before loading documents when using vision mode.

🔧 Extend the Demo

  • Web API integration: expose PdfChat as a REST endpoint for document QA services.
  • Batch processing: process entire document libraries and build searchable knowledge bases.
  • Custom system prompts: tailor the assistant's behavior for specific domains (legal, medical, technical).
  • Custom vector stores: implement IVectorStore for database-backed caching (PostgreSQL, Redis, etc.).
  • Advanced retrieval:
    • Adjust MaxRetrievedPassages and MinRelevanceScore for your use case.
    • Add a Reranker for cross-encoder relevance scoring.
    • Enable IncludePageRenderingsInContext for visual grounding.
    • Implement custom reranking logic using the PassageRetrievalCompleted event.
  • Conversation export: save chat history for audit trails or further analysis.
  • Multi-modal responses: combine with image generation or charting for visual answers.
  • Integration with other LM-Kit features:
    • Chain with Text Analysis for entity extraction from answers.
    • Use Structured Extraction to pull specific data fields.
    • Connect to Function Calling for automated document workflows.

📚 Additional Resources