👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/chat_with_pdf

Chat with PDF in .NET Applications

🎯 Purpose of the Demo

Chat with PDF demonstrates how to use LM-Kit.NET to build an intelligent document question-answering system that combines vision-capable models with semantic retrieval to answer questions about PDF documents.

The sample shows how to:

Download and load vision models and embedding models with progress callbacks.
Create a PdfChat instance that handles document understanding automatically.
Configure document caching using the IVectorStore interface for persistent indexing.
Choose between text extraction (faster) and vision-based document understanding (better for complex layouts).
Load one or multiple PDF documents with automatic caching and indexing.
Ask natural language questions and receive grounded, context-aware responses.
Monitor document import progress, cache hits, and passage retrieval through events.

Why PDF Chat with LM-Kit.NET?

Local-first: all processing runs on your hardware-no cloud dependencies for sensitive documents.
Smart context management: small documents are included in full; large documents use intelligent passage retrieval.
Vision-powered: optionally use vision models to understand complex layouts, tables, and scanned pages.
Rich telemetry: track document processing, cache utilization, retrieval performance, and generation stats.
Multi-document: load multiple PDFs and query across all of them in a single conversation.
Flexible caching: use the built-in filesystem cache or implement custom storage backends.

👥 Target Audience

Enterprise Developers: build secure, on-premise document QA systems.
Legal & Compliance: analyze contracts, policies, and regulatory documents locally.
Research & Academia: query research papers, reports, and technical documentation.
Back-Office & RPA: automate document analysis workflows without cloud exposure.
Demo & Education: explore RAG (Retrieval-Augmented Generation) concepts in a practical C# example.

🚀 Problem Solved

Ask questions in natural language: query PDFs like you would ask a knowledgeable assistant.
Handle complex layouts: vision-based understanding interprets tables, multi-column text, and forms.
Manage large documents: automatic chunking and semantic retrieval for documents that exceed context limits.
Multi-document queries: load several PDFs and ask questions that span multiple sources.
Performance optimization: built-in caching via IVectorStore eliminates redundant processing on repeated loads.
Source transparency: see which passages were used to generate each answer.

💻 Sample Application Description

Console app that:

Lets you choose a vision model for chat (or paste a custom model URI).
Automatically loads an embedding model for semantic search.
Offers two processing modes: standard text extraction or vision-based document understanding.
Downloads models if needed, with live progress updates.
Creates a PdfChat instance with filesystem-based caching via FileSystemVectorStore.
Prompts you to load one or more PDF documents.
Enters an interactive chat loop where you can:
- Ask questions about your documents.
- See which passages were retrieved and from which pages.
- View generation statistics (tokens, speed, context usage).
Supports commands for managing documents and conversation state.
Loops until you press Enter on an empty prompt to quit.

✨ Key Features

📚 Dual processing modes:
- Standard extraction: fast text-based processing with OCR fallback.
- Vision understanding: multimodal analysis for complex layouts using VlmOcr.
🔍 Smart retrieval: automatically decides between full-context and passage retrieval based on document size.
💾 Pluggable caching: use FileSystemVectorStore or implement custom IVectorStore backends.
📊 Rich event system:
- Document import progress (page-by-page processing).
- Cache hit/miss notifications.
- Passage retrieval with timing and source references.
- Response generation status.
💬 Multi-turn conversation: follow-up questions maintain context for natural dialogue.
🔄 Conversation commands: reset, restart, add documents, regenerate, view status.
📈 Generation stats: tokens generated, speed, and context utilization per response.

On startup, the sample shows a model selection menu:

Option	Model	Approx. VRAM Needed
0	Z.ai GLM-V 4.6 Flash 10B	~7 GB VRAM
1	MiniCPM o 4.5 9B	~5.9 GB VRAM
2	Alibaba Qwen 3.5 2B	~2 GB VRAM
3	Alibaba Qwen 3.5 4B	~3.5 GB VRAM
4	Alibaba Qwen 3.5 9B	~7 GB VRAM
5	Google Gemma 3 4B	~5.7 GB VRAM
6	Google Gemma 3 12B	~11 GB VRAM
7	Alibaba Qwen 3.5 27B	~18 GB VRAM
8	Mistral Ministral 3 8B	~6.5 GB VRAM
other	Custom model URI	depends on model

Any input other than 0-8 is treated as a custom model URI and passed directly to the LM constructor.

Additional models loaded automatically:

Embedding model: embeddinggemma-300m - used for semantic passage retrieval.
Vision OCR model (when using document understanding): lightonocr-2:1b - lightweight vision model for page analysis.

🧠 Supported Models

The sample is pre-wired to LM-Kit's predefined model cards:

Chat models:

glm-4.6v-flash
minicpm-o-45
qwen3.5:2b / qwen3.5:4b / qwen3.5:9b
gemma3:4b / gemma3:12b
qwen3.5:27b
ministral3:8b

Embedding model:

embeddinggemma-300m

Document understanding model:

lightonocr-2:1b

Internally:

// Chat model selection
modelUri = ModelCard
    .GetPredefinedModelCardByModelID("qwen3.5:4b")
    .ModelUri;

// Embedding model (auto-loaded)
LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");

// Vision OCR for document understanding
var visionParser = new VlmOcr(LM.LoadFromModelID("lightonocr-2:1b"));

You can also provide any valid model URI manually (including local paths or custom model servers) by typing/pasting it when prompted.

🛠️ Commands & Flow

Startup Flow

Model selection: choose a vision-capable chat model (0-7) or paste a custom URI.
Model loading: chat and embedding models download (if needed) and load with progress reporting.
Processing mode selection:
- 0 - Standard text extraction (faster, uses Tesseract OCR as fallback).
- 1 - Vision-based document understanding (better for complex layouts, loads additional vision model).
Document loading: prompted to enter PDF path(s). Can load multiple documents.
Chat loop: ask questions, receive answers with source references.

Interactive Commands

Inside the chat loop, type these commands instead of a question:

Command	Description
`/help`	Show all available commands with descriptions.
`/status`	Display loaded documents, token usage, and configuration details.
`/add`	Add more PDF documents to the current collection (clears chat history).
`/restart`	Clear chat history but keep all loaded documents.
`/reset`	Remove all documents and clear chat history (prompts for new documents).
`/regenerate`	Generate a new response to your last question.
(empty)	Press Enter on empty prompt to exit the application.

Per-Question Flow

Type your question and press Enter.
Passage retrieval (if using chunked documents): relevant passages are retrieved with timing info.
Response generation: the model generates an answer, streamed to console in real-time.
Stats display: tokens generated, generation speed, and context utilization.
Repeat or use a command.

🗣️ Example Use Cases

Try the sample with:

A contract or legal agreement → ask "What are the termination conditions?" or "Summarize the payment terms."
A research paper → ask "What methodology did the authors use?" or "What were the main findings?"
A technical manual → ask "How do I configure the network settings?" or "What are the system requirements?"
A financial report → ask "What was the revenue growth year-over-year?" or "Summarize the risk factors."
Multiple related documents → load several PDFs and ask "Compare the approaches described in these papers."

After each response, inspect:

Retrieved passages: do they match the most relevant sections?
Source references: correct document and page numbers?
Generation stats: acceptable latency for your use case?

📊 Document Processing Modes

Standard Text Extraction

Processing Mode: 0 - Standard text extraction (faster)

Extracts text directly from PDF structure.
Uses Tesseract OCR as fallback for scanned/image-based pages.
Best for: clean PDFs with simple layouts, text-heavy documents.

Vision-Based Document Understanding

Processing Mode: 1 - Vision-based document understanding (better for complex layouts)

Loads an additional vision model (lightonocr-2:1b).
Analyzes pages visually to understand layout, structure, and relationships.
Best for: multi-column layouts, tables, forms, scanned documents, mixed content.

📦 Document Indexing Modes

When you load a document, PdfChat automatically decides how to process it:

Full Document Mode

When: document fits within FullDocumentTokenBudget (default: 4096 tokens).
Behavior: entire document is included in every query context.
Advantage: model has complete context, best for small documents.
Trade-off: uses more context space, limiting room for conversation history.

Passage Retrieval Mode

When: document exceeds the token budget.
Behavior: document is chunked and indexed for semantic search.
Advantage: handles documents of any size efficiently.
Trade-off: only relevant passages are included (may miss tangentially related content).

The sample reports which mode was used after loading:

✓ Loaded: report.pdf
    Pages: 45
    Tokens: 28,500
    Mode: passage retrieval
    ⚠ Exceeded token budget → using passage retrieval

💾 Document Caching with IVectorStore

PdfChat supports persistent caching of processed documents through the IVectorStore interface. This eliminates redundant processing when the same document is loaded across sessions.

FileSystemVectorStore (Built-in)

The sample uses FileSystemVectorStore for local filesystem caching:

using LMKit.Data.Storage;

string cacheDirectory = Path.Combine(
    Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
    "LMKit", "ChatWithPDF", "Cache");

IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);

// Pass to PdfChat constructor
PdfChat chat = new PdfChat(chatModel, embeddingModel, vectorStore);

Cache Behavior

On cache hit: document loads instantly from pre-indexed data.
On cache miss: full processing (page extraction, embedding generation) occurs, then results are cached.

Monitor cache access via the CacheAccessed event:

chat.CacheAccessed += (sender, e) =>
{
    if (e.IsHit)
        Console.WriteLine($"Cache hit: {e.DocumentName} loaded instantly");
    else
        Console.WriteLine($"Cache miss: {e.DocumentName} will be processed");
};

Disabling Caching

To disable caching, omit the vectorStore parameter:

// No caching - documents processed fresh each time
PdfChat chat = new PdfChat(chatModel, embeddingModel);

Custom Vector Store Implementations

Implement IVectorStore for custom backends (databases, cloud storage, etc.):

public class CustomVectorStore : IVectorStore
{
    // Implement interface methods for your storage backend
    public Task<bool> CollectionExistsAsync(string collectionId, CancellationToken ct) { ... }
    // ... other methods
}

📋 Document Metadata

When loading documents, you can attach rich metadata for source tracking and attribution:

DocumentMetadata Class

var metadata = new DocumentMetadata("Q4 Financial Report")
{
    SourceUri = "https://intranet.example.com/docs/q4-report.pdf",
    AdditionalMetadata = new MetadataCollection
    {
        { "author", "Finance Team" },
        { "department", "Corporate Finance" },
        { "confidentiality", "Internal" },
        { "fiscal_year", "2024" }
    }
};

var result = await chat.LoadDocumentAsync("report.pdf", metadata);

Accessing Metadata in Query Results

Source references include the metadata you specified:

var response = await chat.SubmitAsync("What was the Q4 revenue?");

foreach (var source in response.SourceReferences)
{
    Console.WriteLine($"Source: {source.Name}, Page {source.PageNumber}");
    
    if (source.Metadata?.AdditionalMetadata != null)
    {
        if (source.Metadata.AdditionalMetadata.TryGet("author", out var author))
            Console.WriteLine($"  Author: {author.Value}");
        if (source.Metadata.AdditionalMetadata.TryGet("department", out var dept))
            Console.WriteLine($"  Department: {dept.Value}");
    }
}

Default Metadata

If no metadata is provided, PdfChat creates default metadata using the file name:

// Equivalent to: new DocumentMetadata(Path.GetFileName(filePath))
var result = chat.LoadDocument("report.pdf");

🔧 Advanced Configuration

Reranking for Improved Retrieval

Use a cross-encoder reranker to improve passage retrieval accuracy:

using LMKit.Retrieval;

// Load a reranker model
var rerankerModel = LM.LoadFromModelID("reranker-model-id");

// Configure reranking
chat.Reranker = new RagEngine.RagReranker(rerankerModel, rerankAlpha: 0.7f);

The RerankAlpha parameter (0.0–1.0) controls blending between raw embedding similarity and rerank score:

0.0: use only embedding similarity
1.0: use only rerank score
0.7 (recommended): blend favoring rerank score

Visual Grounding with Page Renderings

Include page images alongside retrieved passages for visual context:

// Enable page renderings in retrieval context
chat.IncludePageRenderingsInContext = true;

When enabled:

Page images corresponding to retrieved passages are injected into the context.
Allows the model to visually interpret tables, charts, and figures.
Increases token consumption proportionally to unique pages referenced.
Requires a vision-capable chat model for best results.

Document Processing Modality

Control whether pages are processed as images or text-only:

// Multimodal: pages may be processed as images (default)
chat.DocumentProcessingModality = InferenceModality.Multimodal;

// Text-only: use extracted text only
chat.DocumentProcessingModality = InferenceModality.Text;

Note: This property must be set before loading documents.

Retrieval Parameters

Fine-tune passage retrieval behavior:

// Maximum passages retrieved per query (default: 5)
chat.MaxRetrievedPassages = 10;

// Minimum relevance score threshold (default: 0.5)
// Higher = fewer but more precise matches
chat.MinRelevanceScore = 0.6f;

// Token budget for full-document mode (default: 4096)
// Documents exceeding this use passage retrieval
chat.FullDocumentTokenBudget = 8192;

// Prefer including small documents in full (default: true)
chat.PreferFullDocumentContext = true;

Reasoning and Sampling

Configure response generation behavior:

// Enable extended reasoning for complex questions
chat.ReasoningLevel = ReasoningLevel.Extended;

// Use temperature-based sampling for varied responses
chat.SamplingMode = new RandomSampling { Temperature = 0.7f };

// Or stick with deterministic outputs (default)
chat.SamplingMode = new GreedyDecoding();

// Maximum tokens per response (default: 2048)
chat.MaximumCompletionTokens = 4096;

⚙️ Behavior & Policies (quick reference)

Model selection: exactly one chat model per process. To change models, restart the app.
Download & load:
- ModelDownloadingProgress prints Downloading model XX.XX% or byte counts.
- ModelLoadingProgress prints Loading model XX% and clears the console once done.
Document caching:
- Uses IVectorStore interface (sample uses FileSystemVectorStore).
- Default cache location: %LocalAppData%/LMKit/ChatWithPDF/Cache
- On cache hit: instant load from pre-indexed data.
- On cache miss: full processing (page extraction, embedding generation).
Passage retrieval:
- Default MaxRetrievedPassages: 5
- Default MinRelevanceScore: 0.5
- Results sorted by document, then page number, then partition index.
Response generation:
- Streaming output via AfterTextCompletion event.
- Internal reasoning shown in dark blue.
- Tool invocations shown in dark yellow.
Licensing:
- You can set an optional license key via LicenseManager.SetLicenseKey("").
- A free community license is available from the LM-Kit website.

💻 Minimal Integration Snippet

using LMKit.Data.Storage;
using LMKit.Extraction.Ocr;
using LMKit.Model;
using LMKit.Retrieval;

public class PdfChatSample
{
    public async Task RunChat(string modelUri, string pdfPath)
    {
        // Load models
        var chatModel = new LM(new Uri(modelUri));
        var embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");

        // Configure caching with vector store
        string cacheDirectory = Path.Combine(
            Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
            "LMKit", "PdfChat", "Cache");
        IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);

        // Create PdfChat instance with caching
        using var chat = new PdfChat(chatModel, embeddingModel, vectorStore)
        {
            PreferFullDocumentContext = true
        };

        // Optional: enable vision-based document understanding
        chat.PageProcessingMode = PageProcessingMode.DocumentUnderstanding;
        chat.DocumentVisionParser = new VlmOcr(LM.LoadFromModelID("lightonocr-2:1b"));

        // Optional: include page images in retrieval context
        // chat.IncludePageRenderingsInContext = true;

        // Subscribe to events
        chat.AfterTextCompletion += (s, e) => Console.Write(e.Text);
        chat.PassageRetrievalCompleted += (s, e) =>
            Console.WriteLine($"Retrieved {e.RetrievedCount} passages in {e.Elapsed.TotalMilliseconds:F0}ms");

        // Load document with optional metadata
        var metadata = new DocumentMetadata(Path.GetFileName(pdfPath))
        {
            SourceUri = pdfPath
        };
        var result = await chat.LoadDocumentAsync(pdfPath, metadata);
        Console.WriteLine($"Loaded: {result.Name} ({result.IndexingMode})");

        // Ask questions
        var response = await chat.SubmitAsync("What are the key points in this document?");
        
        Console.WriteLine();
        Console.WriteLine($"Tokens: {response.Response.GeneratedTokens.Count}");
        Console.WriteLine($"Speed: {response.Response.TokenGenerationRate:F1} tok/s");
    }
}

Use this pattern to integrate PDF chat into web APIs, desktop apps, or document processing pipelines.

🛠️ Getting Started

📋 Prerequisites

.NET 8.0 or later

📥 Download

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/chat_with_pdf

Project Link: chat_with_pdf (same path as above)

▶️ Run

dotnet build
dotnet run

Then:

Select a chat model by typing 0-8, or paste a custom model URI.
Wait for models to download (first run) and load.
Select processing mode: 0 for standard extraction, 1 for vision understanding.
Enter the path to a PDF file when prompted.
Optionally load additional documents (answer y to "Load another document?").
Start asking questions about your documents.
Use commands (/help, /status, /add, etc.) as needed.
Press Enter on an empty prompt to exit.

🔍 Notes on Key Types

Core Classes

PdfChat (LMKit.Retrieval) - main class for PDF question-answering:
- Manages document loading, indexing, and conversation state.
- Automatically chooses between full-context and passage retrieval.
- Supports multi-turn conversation with context preservation.
- Implements IMultiTurnConversation and IDisposable.
IVectorStore (LMKit.Data.Storage) - interface for embedding storage:
- FileSystemVectorStore: built-in filesystem-based implementation.
- Enables persistent caching of document embeddings across sessions.
- Implement custom backends for databases or cloud storage.
DocumentMetadata - rich metadata for source tracking:
- Name: display name for the document.
- SourceUri: original location or reference URL.
- AdditionalMetadata: custom key-value pairs for attribution.

Result Types

DocumentIndexingResult: returned when loading a document:
- Name: the document name (from metadata or file name).
- IndexingMode: FullDocument or PassageRetrieval.
- PageCount: total pages in the document.
- TokenCount: estimated tokens for the document.
- ExceededTokenBudget: whether it exceeded the full-context limit.
DocumentQueryResult: returned when asking a question:
- Response: the TextGenerationResult with completion and stats.
- SourceReferences: list of DocumentReference objects (document name, page number, metadata).
- HasSourceReferences: whether passages were used (vs. full context).

Processing Classes

VlmOcr (LMKit.Extraction.Ocr) - vision-based document parser:
- Analyzes page images to understand structure and content.
- Set via chat.DocumentVisionParser property.
TesseractOcr (LMKit.Integrations.Tesseract) - OCR fallback engine:
- Extracts text from image-based pages.
- Set via chat.OcrEngine property.
RagEngine.RagReranker (LMKit.Retrieval) - cross-encoder reranker:
- Improves retrieval accuracy with learned relevance scoring.
- Set via chat.Reranker property.

Key Events

Event	Description
`DocumentImportProgress`	Page processing and embedding generation progress.
`CacheAccessed`	Cache hit/miss notifications with document identifier.
`PassageRetrievalCompleted`	Retrieval results with timing and source references.
`ResponseGenerationStarted`	Signals when generation begins, includes context mode.
`AfterTextCompletion`	Streaming text output during response generation.

⚠️ Troubleshooting

"No documents have been loaded"
- You must load at least one PDF before asking questions.
- Check the file path and try again.
"File not found"
- Verify the PDF path is correct and the file exists.
- Try using absolute paths or quoting paths with spaces.
Slow document loading
- First load processes and indexes the document (expected).
- Subsequent loads use cache and should be near-instant.
- Try a smaller model or standard extraction mode for faster processing.
Out-of-memory or driver errors
- VRAM insufficient for selected model(s).
- Pick smaller models (e.g., Qwen 3.5 2B, Gemma 3 4B).
- Use standard extraction instead of vision understanding.
Poor answer quality
- Try a larger chat model for better reasoning.
- Use vision-based document understanding for complex layouts.
- Increase MaxRetrievedPassages for more context.
- Lower MinRelevanceScore to include more passages.
- Enable IncludePageRenderingsInContext for visual grounding.
- Add a Reranker for improved passage selection.
Missing passages in answers
- Document may be using full-context mode (no passage retrieval).
- Check with /status command to see document modes.
- For large documents, relevant content may not be in top-k passages.
Cache issues
- Delete the cache directory to force reprocessing.
- Default location: %LocalAppData%/LMKit/ChatWithPDF/Cache
- Verify IVectorStore is configured correctly.
"Cannot modify this setting after documents have been loaded"
- Some properties (DocumentProcessingModality) must be set before loading documents.
- Call ClearDocuments() first, then reconfigure.
"DocumentVisionParser must be set when PageProcessingMode is DocumentUnderstanding"
- Assign a VlmOcr instance before loading documents when using vision mode.

🔧 Extend the Demo

Web API integration: expose PdfChat as a REST endpoint for document QA services.
Batch processing: process entire document libraries and build searchable knowledge bases.
Custom system prompts: tailor the assistant's behavior for specific domains (legal, medical, technical).
Custom vector stores: implement IVectorStore for database-backed caching (PostgreSQL, Redis, etc.).
Advanced retrieval:
- Adjust MaxRetrievedPassages and MinRelevanceScore for your use case.
- Add a Reranker for cross-encoder relevance scoring.
- Enable IncludePageRenderingsInContext for visual grounding.
- Implement custom reranking logic using the PassageRetrievalCompleted event.
Conversation export: save chat history for audit trails or further analysis.
Multi-modal responses: combine with image generation or charting for visual answers.
Integration with other LM-Kit features:
- Chain with Text Analysis for entity extraction from answers.
- Use Structured Extraction to pull specific data fields.
- Connect to Function Calling for automated document workflows.

📚 Additional Resources

How-To: Chat with PDF Documents: Step-by-step guide to building a PDF question-answering system with PdfChat.
How-To: Build a RAG Pipeline: Learn how to set up retrieval-augmented generation for grounding answers in document content.
Glossary: RAG: Explains Retrieval-Augmented Generation, the core technique behind document-grounded Q&A.
Glossary: Embeddings: Covers vector embeddings used for semantic passage retrieval in PdfChat.
Conversational RAG Demo: Multi-turn RAG chatbot with query rewriting and multiple retrieval strategies.

Table of Contents