👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/chat_with_pdf
Chat with PDF in .NET Applications
🎯 Purpose of the Sample
Chat with PDF demonstrates how to use LM-Kit.NET to build an intelligent document question-answering system that combines vision-capable models with semantic retrieval to answer questions about PDF documents.
The sample shows how to:
- Download and load vision models and embedding models with progress callbacks.
- Create a
PdfChatinstance that handles document understanding automatically. - Configure document caching using the
IVectorStoreinterface for persistent indexing. - Choose between text extraction (faster) and vision-based document understanding (better for complex layouts).
- Load one or multiple PDF documents with automatic caching and indexing.
- Ask natural language questions and receive grounded, context-aware responses.
- Monitor document import progress, cache hits, and passage retrieval through events.
Why PDF Chat with LM-Kit.NET?
- Local-first: all processing runs on your hardware-no cloud dependencies for sensitive documents.
- Smart context management: small documents are included in full; large documents use intelligent passage retrieval.
- Vision-powered: optionally use vision models to understand complex layouts, tables, and scanned pages.
- Rich telemetry: track document processing, cache utilization, retrieval performance, and generation stats.
- Multi-document: load multiple PDFs and query across all of them in a single conversation.
- Flexible caching: use the built-in filesystem cache or implement custom storage backends.
👥 Target Audience
- Enterprise Developers: build secure, on-premise document QA systems.
- Legal & Compliance: analyze contracts, policies, and regulatory documents locally.
- Research & Academia: query research papers, reports, and technical documentation.
- Back-Office & RPA: automate document analysis workflows without cloud exposure.
- Demo & Education: explore RAG (Retrieval-Augmented Generation) concepts in a practical C# example.
🚀 Problem Solved
- Ask questions in natural language: query PDFs like you would ask a knowledgeable assistant.
- Handle complex layouts: vision-based understanding interprets tables, multi-column text, and forms.
- Manage large documents: automatic chunking and semantic retrieval for documents that exceed context limits.
- Multi-document queries: load several PDFs and ask questions that span multiple sources.
- Performance optimization: built-in caching via
IVectorStoreeliminates redundant processing on repeated loads. - Source transparency: see which passages were used to generate each answer.
💻 Sample Application Description
Console app that:
- Lets you choose a vision model for chat (or paste a custom model URI).
- Automatically loads an embedding model for semantic search.
- Offers two processing modes: standard text extraction or vision-based document understanding.
- Downloads models if needed, with live progress updates.
- Creates a
PdfChatinstance with filesystem-based caching viaFileSystemVectorStore. - Prompts you to load one or more PDF documents.
- Enters an interactive chat loop where you can:
- Ask questions about your documents.
- See which passages were retrieved and from which pages.
- View generation statistics (tokens, speed, context usage).
- Supports commands for managing documents and conversation state.
- Loops until you press Enter on an empty prompt to quit.
✨ Key Features
- 📚 Dual processing modes:
- Standard extraction: fast text-based processing with OCR fallback.
- Vision understanding: multimodal analysis for complex layouts using
VlmOcr.
- 🔍 Smart retrieval: automatically decides between full-context and passage retrieval based on document size.
- 💾 Pluggable caching: use
FileSystemVectorStoreor implement customIVectorStorebackends. - 📊 Rich event system:
- Document import progress (page-by-page processing).
- Cache hit/miss notifications.
- Passage retrieval with timing and source references.
- Response generation status.
- 💬 Multi-turn conversation: follow-up questions maintain context for natural dialogue.
- 🔄 Conversation commands: reset, restart, add documents, regenerate, view status.
- 📈 Generation stats: tokens generated, speed, and context utilization per response.
🧰 Built-In Models (menu)
On startup, the sample shows a model selection menu:
| Option | Model | Approx. VRAM Needed |
|---|---|---|
| 0 | MiniCPM 2.6 o 8.1B | ~5.9 GB VRAM |
| 1 | Alibaba Qwen 3 2B (vision) | ~2.5 GB VRAM |
| 2 | Alibaba Qwen 3 4B (vision) | ~4 GB VRAM |
| 3 | Alibaba Qwen 3 8B (vision) | ~6.5 GB VRAM |
| 4 | Google Gemma 3 4B (vision) | ~5.7 GB VRAM |
| 5 | Google Gemma 3 12B (vision) | ~11 GB VRAM |
| 6 | Mistral Ministral 3 3B (vision) | ~3.5 GB VRAM |
| 7 | Mistral Ministral 3 8B (vision) | ~6.5 GB VRAM |
| 8 | Mistral Ministral 3 14B (vision) | ~12 GB VRAM |
| other | Custom model URI | depends on model |
Any input other than
0-8is treated as a custom model URI and passed directly to theLMconstructor.
Additional models loaded automatically:
- Embedding model:
embeddinggemma-300m- used for semantic passage retrieval. - Vision OCR model (when using document understanding):
lightonocr1025:1b- lightweight vision model for page analysis.
🧠 Supported Models
The sample is pre-wired to LM-Kit's predefined model cards:
Chat models:
minicpm-oqwen3-vl:2b/qwen3-vl:4b/qwen3-vl:8bgemma3:4b/gemma3:12bministral3:3b/ministral3:8b/ministral3:14b
Embedding model:
embeddinggemma-300m
Document understanding model:
lightonocr1025:1b
Internally:
// Chat model selection
modelUri = ModelCard
.GetPredefinedModelCardByModelID("qwen3-vl:4b")
.ModelUri;
// Embedding model (auto-loaded)
LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
// Vision OCR for document understanding
var visionParser = new VlmOcr(LM.LoadFromModelID("lightonocr1025:1b"));
You can also provide any valid model URI manually (including local paths or custom model servers) by typing/pasting it when prompted.
🛠️ Commands & Flow
Startup Flow
- Model selection: choose a vision-capable chat model (0-8) or paste a custom URI.
- Model loading: chat and embedding models download (if needed) and load with progress reporting.
- Processing mode selection:
0- Standard text extraction (faster, uses Tesseract OCR as fallback).1- Vision-based document understanding (better for complex layouts, loads additional vision model).
- Document loading: prompted to enter PDF path(s). Can load multiple documents.
- Chat loop: ask questions, receive answers with source references.
Interactive Commands
Inside the chat loop, type these commands instead of a question:
| Command | Description |
|---|---|
/help |
Show all available commands with descriptions. |
/status |
Display loaded documents, token usage, and configuration details. |
/add |
Add more PDF documents to the current collection (clears chat history). |
/restart |
Clear chat history but keep all loaded documents. |
/reset |
Remove all documents and clear chat history (prompts for new documents). |
/regenerate |
Generate a new response to your last question. |
| (empty) | Press Enter on empty prompt to exit the application. |
Per-Question Flow
- Type your question and press Enter.
- Passage retrieval (if using chunked documents): relevant passages are retrieved with timing info.
- Response generation: the model generates an answer, streamed to console in real-time.
- Stats display: tokens generated, generation speed, and context utilization.
- Repeat or use a command.
🗣️ Example Use Cases
Try the sample with:
- A contract or legal agreement → ask "What are the termination conditions?" or "Summarize the payment terms."
- A research paper → ask "What methodology did the authors use?" or "What were the main findings?"
- A technical manual → ask "How do I configure the network settings?" or "What are the system requirements?"
- A financial report → ask "What was the revenue growth year-over-year?" or "Summarize the risk factors."
- Multiple related documents → load several PDFs and ask "Compare the approaches described in these papers."
After each response, inspect:
- Retrieved passages: do they match the most relevant sections?
- Source references: correct document and page numbers?
- Generation stats: acceptable latency for your use case?
📊 Document Processing Modes
Standard Text Extraction
Processing Mode: 0 - Standard text extraction (faster)
- Extracts text directly from PDF structure.
- Uses Tesseract OCR as fallback for scanned/image-based pages.
- Best for: clean PDFs with simple layouts, text-heavy documents.
Vision-Based Document Understanding
Processing Mode: 1 - Vision-based document understanding (better for complex layouts)
- Loads an additional vision model (
lightonocr1025:1b). - Analyzes pages visually to understand layout, structure, and relationships.
- Best for: multi-column layouts, tables, forms, scanned documents, mixed content.
📦 Document Indexing Modes
When you load a document, PdfChat automatically decides how to process it:
Full Document Mode
- When: document fits within
FullDocumentTokenBudget(default: 4096 tokens). - Behavior: entire document is included in every query context.
- Advantage: model has complete context, best for small documents.
- Trade-off: uses more context space, limiting room for conversation history.
Passage Retrieval Mode
- When: document exceeds the token budget.
- Behavior: document is chunked and indexed for semantic search.
- Advantage: handles documents of any size efficiently.
- Trade-off: only relevant passages are included (may miss tangentially related content).
The sample reports which mode was used after loading:
✓ Loaded: report.pdf
Pages: 45
Tokens: 28,500
Mode: passage retrieval
⚠ Exceeded token budget → using passage retrieval
💾 Document Caching with IVectorStore
PdfChat supports persistent caching of processed documents through the IVectorStore interface. This eliminates redundant processing when the same document is loaded across sessions.
FileSystemVectorStore (Built-in)
The sample uses FileSystemVectorStore for local filesystem caching:
using LMKit.Data.Storage;
string cacheDirectory = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
"LMKit", "ChatWithPDF", "Cache");
IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);
// Pass to PdfChat constructor
PdfChat chat = new PdfChat(chatModel, embeddingModel, vectorStore);
Cache Behavior
- On cache hit: document loads instantly from pre-indexed data.
- On cache miss: full processing (page extraction, embedding generation) occurs, then results are cached.
Monitor cache access via the CacheAccessed event:
chat.CacheAccessed += (sender, e) =>
{
if (e.IsHit)
Console.WriteLine($"Cache hit: {e.DocumentName} loaded instantly");
else
Console.WriteLine($"Cache miss: {e.DocumentName} will be processed");
};
Disabling Caching
To disable caching, omit the vectorStore parameter:
// No caching - documents processed fresh each time
PdfChat chat = new PdfChat(chatModel, embeddingModel);
Custom Vector Store Implementations
Implement IVectorStore for custom backends (databases, cloud storage, etc.):
public class CustomVectorStore : IVectorStore
{
// Implement interface methods for your storage backend
public Task<bool> CollectionExistsAsync(string collectionId, CancellationToken ct) { ... }
// ... other methods
}
📋 Document Metadata
When loading documents, you can attach rich metadata for source tracking and attribution:
DocumentMetadata Class
var metadata = new DocumentMetadata("Q4 Financial Report")
{
SourceUri = "https://intranet.example.com/docs/q4-report.pdf",
AdditionalMetadata = new MetadataCollection
{
{ "author", "Finance Team" },
{ "department", "Corporate Finance" },
{ "confidentiality", "Internal" },
{ "fiscal_year", "2024" }
}
};
var result = await chat.LoadDocumentAsync("report.pdf", metadata);
Accessing Metadata in Query Results
Source references include the metadata you specified:
var response = await chat.SubmitAsync("What was the Q4 revenue?");
foreach (var source in response.SourceReferences)
{
Console.WriteLine($"Source: {source.Name}, Page {source.PageNumber}");
if (source.Metadata?.AdditionalMetadata != null)
{
if (source.Metadata.AdditionalMetadata.TryGet("author", out var author))
Console.WriteLine($" Author: {author.Value}");
if (source.Metadata.AdditionalMetadata.TryGet("department", out var dept))
Console.WriteLine($" Department: {dept.Value}");
}
}
Default Metadata
If no metadata is provided, PdfChat creates default metadata using the file name:
// Equivalent to: new DocumentMetadata(Path.GetFileName(filePath))
var result = chat.LoadDocument("report.pdf");
🔧 Advanced Configuration
Reranking for Improved Retrieval
Use a cross-encoder reranker to improve passage retrieval accuracy:
using LMKit.Retrieval;
// Load a reranker model
var rerankerModel = LM.LoadFromModelID("reranker-model-id");
// Configure reranking
chat.Reranker = new RagEngine.RagReranker(rerankerModel, rerankAlpha: 0.7f);
The RerankAlpha parameter (0.0–1.0) controls blending between raw embedding similarity and rerank score:
0.0: use only embedding similarity1.0: use only rerank score0.7(recommended): blend favoring rerank score
Visual Grounding with Page Renderings
Include page images alongside retrieved passages for visual context:
// Enable page renderings in retrieval context
chat.IncludePageRenderingsInContext = true;
When enabled:
- Page images corresponding to retrieved passages are injected into the context.
- Allows the model to visually interpret tables, charts, and figures.
- Increases token consumption proportionally to unique pages referenced.
- Requires a vision-capable chat model for best results.
Document Processing Modality
Control whether pages are processed as images or text-only:
// Multimodal: pages may be processed as images (default)
chat.DocumentProcessingModality = InferenceModality.Multimodal;
// Text-only: use extracted text only
chat.DocumentProcessingModality = InferenceModality.Text;
Note: This property must be set before loading documents.
Retrieval Parameters
Fine-tune passage retrieval behavior:
// Maximum passages retrieved per query (default: 5)
chat.MaxRetrievedPassages = 10;
// Minimum relevance score threshold (default: 0.5)
// Higher = fewer but more precise matches
chat.MinRelevanceScore = 0.6f;
// Token budget for full-document mode (default: 4096)
// Documents exceeding this use passage retrieval
chat.FullDocumentTokenBudget = 8192;
// Prefer including small documents in full (default: true)
chat.PreferFullDocumentContext = true;
Reasoning and Sampling
Configure response generation behavior:
// Enable extended reasoning for complex questions
chat.ReasoningLevel = ReasoningLevel.Extended;
// Use temperature-based sampling for varied responses
chat.SamplingMode = new RandomSampling { Temperature = 0.7f };
// Or stick with deterministic outputs (default)
chat.SamplingMode = new GreedyDecoding();
// Maximum tokens per response (default: 2048)
chat.MaximumCompletionTokens = 4096;
⚙️ Behavior & Policies (quick reference)
- Model selection: exactly one chat model per process. To change models, restart the app.
- Download & load:
ModelDownloadingProgressprintsDownloading model XX.XX%or byte counts.ModelLoadingProgressprintsLoading model XX%and clears the console once done.
- Document caching:
- Uses
IVectorStoreinterface (sample usesFileSystemVectorStore). - Default cache location:
%LocalAppData%/LMKit/ChatWithPDF/Cache - On cache hit: instant load from pre-indexed data.
- On cache miss: full processing (page extraction, embedding generation).
- Uses
- Passage retrieval:
- Default
MaxRetrievedPassages: 5 - Default
MinRelevanceScore: 0.5 - Results sorted by document, then page number, then partition index.
- Default
- Response generation:
- Streaming output via
AfterTextCompletionevent. - Internal reasoning shown in dark blue.
- Tool invocations shown in dark yellow.
- Streaming output via
- Licensing:
- You can set an optional license key via
LicenseManager.SetLicenseKey(""). - A free community license is available from the LM-Kit website.
- You can set an optional license key via
💻 Minimal Integration Snippet
using LMKit.Data.Storage;
using LMKit.Extraction.Ocr;
using LMKit.Model;
using LMKit.Retrieval;
public class PdfChatSample
{
public async Task RunChat(string modelUri, string pdfPath)
{
// Load models
var chatModel = new LM(new Uri(modelUri));
var embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
// Configure caching with vector store
string cacheDirectory = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
"LMKit", "PdfChat", "Cache");
IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);
// Create PdfChat instance with caching
using var chat = new PdfChat(chatModel, embeddingModel, vectorStore)
{
PreferFullDocumentContext = true
};
// Optional: enable vision-based document understanding
chat.PageProcessingMode = PageProcessingMode.DocumentUnderstanding;
chat.DocumentVisionParser = new VlmOcr(LM.LoadFromModelID("lightonocr1025:1b"));
// Optional: include page images in retrieval context
// chat.IncludePageRenderingsInContext = true;
// Subscribe to events
chat.AfterTextCompletion += (s, e) => Console.Write(e.Text);
chat.PassageRetrievalCompleted += (s, e) =>
Console.WriteLine($"Retrieved {e.RetrievedCount} passages in {e.Elapsed.TotalMilliseconds:F0}ms");
// Load document with optional metadata
var metadata = new DocumentMetadata(Path.GetFileName(pdfPath))
{
SourceUri = pdfPath
};
var result = await chat.LoadDocumentAsync(pdfPath, metadata);
Console.WriteLine($"Loaded: {result.Name} ({result.IndexingMode})");
// Ask questions
var response = await chat.SubmitAsync("What are the key points in this document?");
Console.WriteLine();
Console.WriteLine($"Tokens: {response.Response.GeneratedTokens.Count}");
Console.WriteLine($"Speed: {response.Response.TokenGenerationRate:F1} tok/s");
}
}
Use this pattern to integrate PDF chat into web APIs, desktop apps, or document processing pipelines.
🛠️ Getting Started
📋 Prerequisites
- .NET Framework 4.6.2 or .NET 8.0+
📥 Download
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/chat_with_pdf
Project Link: chat_with_pdf (same path as above)
▶️ Run
dotnet build
dotnet run
Then:
- Select a chat model by typing 0-8, or paste a custom model URI.
- Wait for models to download (first run) and load.
- Select processing mode:
0for standard extraction,1for vision understanding. - Enter the path to a PDF file when prompted.
- Optionally load additional documents (answer
yto "Load another document?"). - Start asking questions about your documents.
- Use commands (
/help,/status,/add, etc.) as needed. - Press Enter on an empty prompt to exit.
🔍 Notes on Key Types
Core Classes
PdfChat(LMKit.Retrieval) - main class for PDF question-answering:- Manages document loading, indexing, and conversation state.
- Automatically chooses between full-context and passage retrieval.
- Supports multi-turn conversation with context preservation.
- Implements
IMultiTurnConversationandIDisposable.
IVectorStore(LMKit.Data.Storage) - interface for embedding storage:FileSystemVectorStore: built-in filesystem-based implementation.- Enables persistent caching of document embeddings across sessions.
- Implement custom backends for databases or cloud storage.
DocumentMetadata- rich metadata for source tracking:Name: display name for the document.SourceUri: original location or reference URL.AdditionalMetadata: custom key-value pairs for attribution.
Result Types
DocumentIndexingResult: returned when loading a document:Name: the document name (from metadata or file name).IndexingMode:FullDocumentorPassageRetrieval.PageCount: total pages in the document.TokenCount: estimated tokens for the document.ExceededTokenBudget: whether it exceeded the full-context limit.
DocumentQueryResult: returned when asking a question:Response: theTextGenerationResultwith completion and stats.SourceReferences: list ofDocumentReferenceobjects (document name, page number, metadata).HasSourceReferences: whether passages were used (vs. full context).
Processing Classes
VlmOcr(LMKit.Extraction.Ocr) - vision-based document parser:- Analyzes page images to understand structure and content.
- Set via
chat.DocumentVisionParserproperty.
TesseractOcr(LMKit.Integrations.Tesseract) - OCR fallback engine:- Extracts text from image-based pages.
- Set via
chat.OcrEngineproperty.
RagEngine.RagReranker(LMKit.Retrieval) - cross-encoder reranker:- Improves retrieval accuracy with learned relevance scoring.
- Set via
chat.Rerankerproperty.
Key Events
| Event | Description |
|---|---|
DocumentImportProgress |
Page processing and embedding generation progress. |
CacheAccessed |
Cache hit/miss notifications with document identifier. |
PassageRetrievalCompleted |
Retrieval results with timing and source references. |
ResponseGenerationStarted |
Signals when generation begins, includes context mode. |
AfterTextCompletion |
Streaming text output during response generation. |
⚠️ Troubleshooting
"No documents have been loaded"
- You must load at least one PDF before asking questions.
- Check the file path and try again.
"File not found"
- Verify the PDF path is correct and the file exists.
- Try using absolute paths or quoting paths with spaces.
Slow document loading
- First load processes and indexes the document (expected).
- Subsequent loads use cache and should be near-instant.
- Try a smaller model or standard extraction mode for faster processing.
Out-of-memory or driver errors
- VRAM insufficient for selected model(s).
- Pick smaller models (e.g., Qwen 3 2B, Ministral 3 3B).
- Use standard extraction instead of vision understanding.
Poor answer quality
- Try a larger chat model for better reasoning.
- Use vision-based document understanding for complex layouts.
- Increase
MaxRetrievedPassagesfor more context. - Lower
MinRelevanceScoreto include more passages. - Enable
IncludePageRenderingsInContextfor visual grounding. - Add a
Rerankerfor improved passage selection.
Missing passages in answers
- Document may be using full-context mode (no passage retrieval).
- Check with
/statuscommand to see document modes. - For large documents, relevant content may not be in top-k passages.
Cache issues
- Delete the cache directory to force reprocessing.
- Default location:
%LocalAppData%/LMKit/ChatWithPDF/Cache - Verify
IVectorStoreis configured correctly.
"Cannot modify this setting after documents have been loaded"
- Some properties (
DocumentProcessingModality) must be set before loading documents. - Call
ClearDocuments()first, then reconfigure.
- Some properties (
"DocumentVisionParser must be set when PageProcessingMode is DocumentUnderstanding"
- Assign a
VlmOcrinstance before loading documents when using vision mode.
- Assign a
🔧 Extend the Demo
- Web API integration: expose PdfChat as a REST endpoint for document QA services.
- Batch processing: process entire document libraries and build searchable knowledge bases.
- Custom system prompts: tailor the assistant's behavior for specific domains (legal, medical, technical).
- Custom vector stores: implement
IVectorStorefor database-backed caching (PostgreSQL, Redis, etc.). - Advanced retrieval:
- Adjust
MaxRetrievedPassagesandMinRelevanceScorefor your use case. - Add a
Rerankerfor cross-encoder relevance scoring. - Enable
IncludePageRenderingsInContextfor visual grounding. - Implement custom reranking logic using the
PassageRetrievalCompletedevent.
- Adjust
- Conversation export: save chat history for audit trails or further analysis.
- Multi-modal responses: combine with image generation or charting for visual answers.
- Integration with other LM-Kit features:
- Chain with Text Analysis for entity extraction from answers.
- Use Structured Extraction to pull specific data fields.
- Connect to Function Calling for automated document workflows.