π Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/chat_with_pdf
Chat with PDF in .NET Applications
π― Purpose of the Sample
Chat with PDF demonstrates how to use LM-Kit.NET to build an intelligent document question-answering system that combines vision-capable models with semantic retrieval to answer questions about PDF documents.
The sample shows how to:
- Download and load vision models and embedding models with progress callbacks.
- Create a
PdfChatinstance that handles document understanding automatically. - Configure document caching using the
IVectorStoreinterface for persistent indexing. - Choose between text extraction (faster) and vision-based document understanding (better for complex layouts).
- Load one or multiple PDF documents with automatic caching and indexing.
- Ask natural language questions and receive grounded, context-aware responses.
- Monitor document import progress, cache hits, and passage retrieval through events.
Why PDF Chat with LM-Kit.NET?
- Local-first: all processing runs on your hardwareβno cloud dependencies for sensitive documents.
- Smart context management: small documents are included in full; large documents use intelligent passage retrieval.
- Vision-powered: optionally use vision models to understand complex layouts, tables, and scanned pages.
- Rich telemetry: track document processing, cache utilization, retrieval performance, and generation stats.
- Multi-document: load multiple PDFs and query across all of them in a single conversation.
- Flexible caching: use the built-in filesystem cache or implement custom storage backends.
π₯ Target Audience
- Enterprise Developers: build secure, on-premise document QA systems.
- Legal & Compliance: analyze contracts, policies, and regulatory documents locally.
- Research & Academia: query research papers, reports, and technical documentation.
- Back-Office & RPA: automate document analysis workflows without cloud exposure.
- Demo & Education: explore RAG (Retrieval-Augmented Generation) concepts in a practical C# example.
π Problem Solved
- Ask questions in natural language: query PDFs like you would ask a knowledgeable assistant.
- Handle complex layouts: vision-based understanding interprets tables, multi-column text, and forms.
- Manage large documents: automatic chunking and semantic retrieval for documents that exceed context limits.
- Multi-document queries: load several PDFs and ask questions that span multiple sources.
- Performance optimization: built-in caching via
IVectorStoreeliminates redundant processing on repeated loads. - Source transparency: see which passages were used to generate each answer.
π» Sample Application Description
Console app that:
- Lets you choose a vision model for chat (or paste a custom model URI).
- Automatically loads an embedding model for semantic search.
- Offers two processing modes: standard text extraction or vision-based document understanding.
- Downloads models if needed, with live progress updates.
- Creates a
PdfChatinstance with filesystem-based caching viaFileSystemVectorStore. - Prompts you to load one or more PDF documents.
- Enters an interactive chat loop where you can:
- Ask questions about your documents.
- See which passages were retrieved and from which pages.
- View generation statistics (tokens, speed, context usage).
- Supports commands for managing documents and conversation state.
- Loops until you press Enter on an empty prompt to quit.
β¨ Key Features
- π Dual processing modes:
- Standard extraction: fast text-based processing with OCR fallback.
- Vision understanding: multimodal analysis for complex layouts using
VlmOcr.
- π Smart retrieval: automatically decides between full-context and passage retrieval based on document size.
- πΎ Pluggable caching: use
FileSystemVectorStoreor implement customIVectorStorebackends. - π Rich event system:
- Document import progress (page-by-page processing).
- Cache hit/miss notifications.
- Passage retrieval with timing and source references.
- Response generation status.
- π¬ Multi-turn conversation: follow-up questions maintain context for natural dialogue.
- π Conversation commands: reset, restart, add documents, regenerate, view status.
- π Generation stats: tokens generated, speed, and context utilization per response.
π§° Built-In Models (menu)
On startup, the sample shows a model selection menu:
| Option | Model | Approx. VRAM Needed |
|---|---|---|
| 0 | MiniCPM 2.6 o 8.1B | ~5.9 GB VRAM |
| 1 | Alibaba Qwen 3 2B (vision) | ~2.5 GB VRAM |
| 2 | Alibaba Qwen 3 4B (vision) | ~4 GB VRAM |
| 3 | Alibaba Qwen 3 8B (vision) | ~6.5 GB VRAM |
| 4 | Google Gemma 3 4B (vision) | ~5.7 GB VRAM |
| 5 | Google Gemma 3 12B (vision) | ~11 GB VRAM |
| 6 | Mistral Ministral 3 3B (vision) | ~3.5 GB VRAM |
| 7 | Mistral Ministral 3 8B (vision) | ~6.5 GB VRAM |
| 8 | Mistral Ministral 3 14B (vision) | ~12 GB VRAM |
| other | Custom model URI | depends on model |
Any input other than
0-8is treated as a custom model URI and passed directly to theLMconstructor.
Additional models loaded automatically:
- Embedding model:
embeddinggemma-300m- used for semantic passage retrieval. - Vision OCR model (when using document understanding):
lightonocr1025:1b- lightweight vision model for page analysis.
π§ Supported Models
The sample is pre-wired to LM-Kit's predefined model cards:
Chat models:
minicpm-oqwen3-vl:2b/qwen3-vl:4b/qwen3-vl:8bgemma3:4b/gemma3:12bministral3:3b/ministral3:8b/ministral3:14b
Embedding model:
embeddinggemma-300m
Document understanding model:
lightonocr1025:1b
Internally:
// Chat model selection
modelUri = ModelCard
.GetPredefinedModelCardByModelID("qwen3-vl:4b")
.ModelUri;
// Embedding model (auto-loaded)
LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
// Vision OCR for document understanding
var visionParser = new VlmOcr(LM.LoadFromModelID("lightonocr1025:1b"));
You can also provide any valid model URI manually (including local paths or custom model servers) by typing/pasting it when prompted.
π οΈ Commands & Flow
Startup Flow
- Model selection: choose a vision-capable chat model (0-8) or paste a custom URI.
- Model loading: chat and embedding models download (if needed) and load with progress reporting.
- Processing mode selection:
0- Standard text extraction (faster, uses Tesseract OCR as fallback).1- Vision-based document understanding (better for complex layouts, loads additional vision model).
- Document loading: prompted to enter PDF path(s). Can load multiple documents.
- Chat loop: ask questions, receive answers with source references.
Interactive Commands
Inside the chat loop, type these commands instead of a question:
| Command | Description |
|---|---|
/help |
Show all available commands with descriptions. |
/status |
Display loaded documents, token usage, and configuration details. |
/add |
Add more PDF documents to the current collection (clears chat history). |
/restart |
Clear chat history but keep all loaded documents. |
/reset |
Remove all documents and clear chat history (prompts for new documents). |
/regenerate |
Generate a new response to your last question. |
| (empty) | Press Enter on empty prompt to exit the application. |
Per-Question Flow
- Type your question and press Enter.
- Passage retrieval (if using chunked documents): relevant passages are retrieved with timing info.
- Response generation: the model generates an answer, streamed to console in real-time.
- Stats display: tokens generated, generation speed, and context utilization.
- Repeat or use a command.
π£οΈ Example Use Cases
Try the sample with:
- A contract or legal agreement β ask "What are the termination conditions?" or "Summarize the payment terms."
- A research paper β ask "What methodology did the authors use?" or "What were the main findings?"
- A technical manual β ask "How do I configure the network settings?" or "What are the system requirements?"
- A financial report β ask "What was the revenue growth year-over-year?" or "Summarize the risk factors."
- Multiple related documents β load several PDFs and ask "Compare the approaches described in these papers."
After each response, inspect:
- Retrieved passages: do they match the most relevant sections?
- Source references: correct document and page numbers?
- Generation stats: acceptable latency for your use case?
π Document Processing Modes
Standard Text Extraction
Processing Mode: 0 - Standard text extraction (faster)
- Extracts text directly from PDF structure.
- Uses Tesseract OCR as fallback for scanned/image-based pages.
- Best for: clean PDFs with simple layouts, text-heavy documents.
Vision-Based Document Understanding
Processing Mode: 1 - Vision-based document understanding (better for complex layouts)
- Loads an additional vision model (
lightonocr1025:1b). - Analyzes pages visually to understand layout, structure, and relationships.
- Best for: multi-column layouts, tables, forms, scanned documents, mixed content.
π¦ Document Indexing Modes
When you load a document, PdfChat automatically decides how to process it:
Full Document Mode
- When: document fits within
FullDocumentTokenBudget(default: 4096 tokens). - Behavior: entire document is included in every query context.
- Advantage: model has complete context, best for small documents.
- Trade-off: uses more context space, limiting room for conversation history.
Passage Retrieval Mode
- When: document exceeds the token budget.
- Behavior: document is chunked and indexed for semantic search.
- Advantage: handles documents of any size efficiently.
- Trade-off: only relevant passages are included (may miss tangentially related content).
The sample reports which mode was used after loading:
β Loaded: report.pdf
Pages: 45
Tokens: 28,500
Mode: passage retrieval
β Exceeded token budget β using passage retrieval
πΎ Document Caching with IVectorStore
PdfChat supports persistent caching of processed documents through the IVectorStore interface. This eliminates redundant processing when the same document is loaded across sessions.
FileSystemVectorStore (Built-in)
The sample uses FileSystemVectorStore for local filesystem caching:
using LMKit.Data.Storage;
string cacheDirectory = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
"LMKit", "ChatWithPDF", "Cache");
IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);
// Pass to PdfChat constructor
PdfChat chat = new PdfChat(chatModel, embeddingModel, vectorStore);
Cache Behavior
- On cache hit: document loads instantly from pre-indexed data.
- On cache miss: full processing (page extraction, embedding generation) occurs, then results are cached.
Monitor cache access via the CacheAccessed event:
chat.CacheAccessed += (sender, e) =>
{
if (e.IsHit)
Console.WriteLine($"Cache hit: {e.DocumentName} loaded instantly");
else
Console.WriteLine($"Cache miss: {e.DocumentName} will be processed");
};
Disabling Caching
To disable caching, omit the vectorStore parameter:
// No caching - documents processed fresh each time
PdfChat chat = new PdfChat(chatModel, embeddingModel);
Custom Vector Store Implementations
Implement IVectorStore for custom backends (databases, cloud storage, etc.):
public class CustomVectorStore : IVectorStore
{
// Implement interface methods for your storage backend
public Task<bool> CollectionExistsAsync(string collectionId, CancellationToken ct) { ... }
// ... other methods
}
π Document Metadata
When loading documents, you can attach rich metadata for source tracking and attribution:
DocumentMetadata Class
var metadata = new DocumentMetadata("Q4 Financial Report")
{
SourceUri = "https://intranet.example.com/docs/q4-report.pdf",
AdditionalMetadata = new MetadataCollection
{
{ "author", "Finance Team" },
{ "department", "Corporate Finance" },
{ "confidentiality", "Internal" },
{ "fiscal_year", "2024" }
}
};
var result = await chat.LoadDocumentAsync("report.pdf", metadata);
Accessing Metadata in Query Results
Source references include the metadata you specified:
var response = await chat.SubmitAsync("What was the Q4 revenue?");
foreach (var source in response.SourceReferences)
{
Console.WriteLine($"Source: {source.Name}, Page {source.PageNumber}");
if (source.Metadata?.AdditionalMetadata != null)
{
if (source.Metadata.AdditionalMetadata.TryGet("author", out var author))
Console.WriteLine($" Author: {author.Value}");
if (source.Metadata.AdditionalMetadata.TryGet("department", out var dept))
Console.WriteLine($" Department: {dept.Value}");
}
}
Default Metadata
If no metadata is provided, PdfChat creates default metadata using the file name:
// Equivalent to: new DocumentMetadata(Path.GetFileName(filePath))
var result = chat.LoadDocument("report.pdf");
π§ Advanced Configuration
Reranking for Improved Retrieval
Use a cross-encoder reranker to improve passage retrieval accuracy:
using LMKit.Retrieval;
// Load a reranker model
var rerankerModel = LM.LoadFromModelID("reranker-model-id");
// Configure reranking
chat.Reranker = new RagEngine.RagReranker(rerankerModel, rerankAlpha: 0.7f);
The RerankAlpha parameter (0.0β1.0) controls blending between raw embedding similarity and rerank score:
0.0: use only embedding similarity1.0: use only rerank score0.7(recommended): blend favoring rerank score
Visual Grounding with Page Renderings
Include page images alongside retrieved passages for visual context:
// Enable page renderings in retrieval context
chat.IncludePageRenderingsInContext = true;
When enabled:
- Page images corresponding to retrieved passages are injected into the context.
- Allows the model to visually interpret tables, charts, and figures.
- Increases token consumption proportionally to unique pages referenced.
- Requires a vision-capable chat model for best results.
Document Processing Modality
Control whether pages are processed as images or text-only:
// Multimodal: pages may be processed as images (default)
chat.DocumentProcessingModality = InferenceModality.Multimodal;
// Text-only: use extracted text only
chat.DocumentProcessingModality = InferenceModality.Text;
Note: This property must be set before loading documents.
Retrieval Parameters
Fine-tune passage retrieval behavior:
// Maximum passages retrieved per query (default: 5)
chat.MaxRetrievedPassages = 10;
// Minimum relevance score threshold (default: 0.5)
// Higher = fewer but more precise matches
chat.MinRelevanceScore = 0.6f;
// Token budget for full-document mode (default: 4096)
// Documents exceeding this use passage retrieval
chat.FullDocumentTokenBudget = 8192;
// Prefer including small documents in full (default: true)
chat.PreferFullDocumentContext = true;
Reasoning and Sampling
Configure response generation behavior:
// Enable extended reasoning for complex questions
chat.ReasoningLevel = ReasoningLevel.Extended;
// Use temperature-based sampling for varied responses
chat.SamplingMode = new RandomSampling { Temperature = 0.7f };
// Or stick with deterministic outputs (default)
chat.SamplingMode = new GreedyDecoding();
// Maximum tokens per response (default: 2048)
chat.MaximumCompletionTokens = 4096;
βοΈ Behavior & Policies (quick reference)
- Model selection: exactly one chat model per process. To change models, restart the app.
- Download & load:
ModelDownloadingProgressprintsDownloading model XX.XX%or byte counts.ModelLoadingProgressprintsLoading model XX%and clears the console once done.
- Document caching:
- Uses
IVectorStoreinterface (sample usesFileSystemVectorStore). - Default cache location:
%LocalAppData%/LMKit/ChatWithPDF/Cache - On cache hit: instant load from pre-indexed data.
- On cache miss: full processing (page extraction, embedding generation).
- Uses
- Passage retrieval:
- Default
MaxRetrievedPassages: 5 - Default
MinRelevanceScore: 0.5 - Results sorted by document, then page number, then partition index.
- Default
- Response generation:
- Streaming output via
AfterTextCompletionevent. - Internal reasoning shown in dark blue.
- Tool invocations shown in dark yellow.
- Streaming output via
- Licensing:
- You can set an optional license key via
LicenseManager.SetLicenseKey(""). - A free community license is available from the LM-Kit website.
- You can set an optional license key via
π» Minimal Integration Snippet
using LMKit.Data.Storage;
using LMKit.Extraction.Ocr;
using LMKit.Model;
using LMKit.Retrieval;
public class PdfChatSample
{
public async Task RunChat(string modelUri, string pdfPath)
{
// Load models
var chatModel = new LM(new Uri(modelUri));
var embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
// Configure caching with vector store
string cacheDirectory = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
"LMKit", "PdfChat", "Cache");
IVectorStore vectorStore = new FileSystemVectorStore(cacheDirectory);
// Create PdfChat instance with caching
using var chat = new PdfChat(chatModel, embeddingModel, vectorStore)
{
PreferFullDocumentContext = true
};
// Optional: enable vision-based document understanding
chat.PageProcessingMode = PageProcessingMode.DocumentUnderstanding;
chat.DocumentVisionParser = new VlmOcr(LM.LoadFromModelID("lightonocr1025:1b"));
// Optional: include page images in retrieval context
// chat.IncludePageRenderingsInContext = true;
// Subscribe to events
chat.AfterTextCompletion += (s, e) => Console.Write(e.Text);
chat.PassageRetrievalCompleted += (s, e) =>
Console.WriteLine($"Retrieved {e.RetrievedCount} passages in {e.Elapsed.TotalMilliseconds:F0}ms");
// Load document with optional metadata
var metadata = new DocumentMetadata(Path.GetFileName(pdfPath))
{
SourceUri = pdfPath
};
var result = await chat.LoadDocumentAsync(pdfPath, metadata);
Console.WriteLine($"Loaded: {result.Name} ({result.IndexingMode})");
// Ask questions
var response = await chat.SubmitAsync("What are the key points in this document?");
Console.WriteLine();
Console.WriteLine($"Tokens: {response.Response.GeneratedTokens.Count}");
Console.WriteLine($"Speed: {response.Response.TokenGenerationRate:F1} tok/s");
}
}
Use this pattern to integrate PDF chat into web APIs, desktop apps, or document processing pipelines.
π οΈ Getting Started
π Prerequisites
- .NET Framework 4.6.2 or .NET 8.0+
π₯ Download
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/chat_with_pdf
Project Link: chat_with_pdf (same path as above)
βΆοΈ Run
dotnet build
dotnet run
Then:
- Select a chat model by typing 0-8, or paste a custom model URI.
- Wait for models to download (first run) and load.
- Select processing mode:
0for standard extraction,1for vision understanding. - Enter the path to a PDF file when prompted.
- Optionally load additional documents (answer
yto "Load another document?"). - Start asking questions about your documents.
- Use commands (
/help,/status,/add, etc.) as needed. - Press Enter on an empty prompt to exit.
π Notes on Key Types
Core Classes
PdfChat(LMKit.Retrieval) - main class for PDF question-answering:- Manages document loading, indexing, and conversation state.
- Automatically chooses between full-context and passage retrieval.
- Supports multi-turn conversation with context preservation.
- Implements
IMultiTurnConversationandIDisposable.
IVectorStore(LMKit.Data.Storage) - interface for embedding storage:FileSystemVectorStore: built-in filesystem-based implementation.- Enables persistent caching of document embeddings across sessions.
- Implement custom backends for databases or cloud storage.
DocumentMetadata- rich metadata for source tracking:Name: display name for the document.SourceUri: original location or reference URL.AdditionalMetadata: custom key-value pairs for attribution.
Result Types
DocumentIndexingResult: returned when loading a document:Name: the document name (from metadata or file name).IndexingMode:FullDocumentorPassageRetrieval.PageCount: total pages in the document.TokenCount: estimated tokens for the document.ExceededTokenBudget: whether it exceeded the full-context limit.
DocumentQueryResult: returned when asking a question:Response: theTextGenerationResultwith completion and stats.SourceReferences: list ofDocumentReferenceobjects (document name, page number, metadata).HasSourceReferences: whether passages were used (vs. full context).
Processing Classes
VlmOcr(LMKit.Extraction.Ocr) - vision-based document parser:- Analyzes page images to understand structure and content.
- Set via
chat.DocumentVisionParserproperty.
TesseractOcr(LMKit.Integrations.Tesseract) - OCR fallback engine:- Extracts text from image-based pages.
- Set via
chat.OcrEngineproperty.
RagEngine.RagReranker(LMKit.Retrieval) - cross-encoder reranker:- Improves retrieval accuracy with learned relevance scoring.
- Set via
chat.Rerankerproperty.
Key Events
| Event | Description |
|---|---|
DocumentImportProgress |
Page processing and embedding generation progress. |
CacheAccessed |
Cache hit/miss notifications with document identifier. |
PassageRetrievalCompleted |
Retrieval results with timing and source references. |
ResponseGenerationStarted |
Signals when generation begins, includes context mode. |
AfterTextCompletion |
Streaming text output during response generation. |
β οΈ Troubleshooting
"No documents have been loaded"
- You must load at least one PDF before asking questions.
- Check the file path and try again.
"File not found"
- Verify the PDF path is correct and the file exists.
- Try using absolute paths or quoting paths with spaces.
Slow document loading
- First load processes and indexes the document (expected).
- Subsequent loads use cache and should be near-instant.
- Try a smaller model or standard extraction mode for faster processing.
Out-of-memory or driver errors
- VRAM insufficient for selected model(s).
- Pick smaller models (e.g., Qwen 3 2B, Ministral 3 3B).
- Use standard extraction instead of vision understanding.
Poor answer quality
- Try a larger chat model for better reasoning.
- Use vision-based document understanding for complex layouts.
- Increase
MaxRetrievedPassagesfor more context. - Lower
MinRelevanceScoreto include more passages. - Enable
IncludePageRenderingsInContextfor visual grounding. - Add a
Rerankerfor improved passage selection.
Missing passages in answers
- Document may be using full-context mode (no passage retrieval).
- Check with
/statuscommand to see document modes. - For large documents, relevant content may not be in top-k passages.
Cache issues
- Delete the cache directory to force reprocessing.
- Default location:
%LocalAppData%/LMKit/ChatWithPDF/Cache - Verify
IVectorStoreis configured correctly.
"Cannot modify this setting after documents have been loaded"
- Some properties (
DocumentProcessingModality) must be set before loading documents. - Call
ClearDocuments()first, then reconfigure.
- Some properties (
"DocumentVisionParser must be set when PageProcessingMode is DocumentUnderstanding"
- Assign a
VlmOcrinstance before loading documents when using vision mode.
- Assign a
π§ Extend the Demo
- Web API integration: expose PdfChat as a REST endpoint for document QA services.
- Batch processing: process entire document libraries and build searchable knowledge bases.
- Custom system prompts: tailor the assistant's behavior for specific domains (legal, medical, technical).
- Custom vector stores: implement
IVectorStorefor database-backed caching (PostgreSQL, Redis, etc.). - Advanced retrieval:
- Adjust
MaxRetrievedPassagesandMinRelevanceScorefor your use case. - Add a
Rerankerfor cross-encoder relevance scoring. - Enable
IncludePageRenderingsInContextfor visual grounding. - Implement custom reranking logic using the
PassageRetrievalCompletedevent.
- Adjust
- Conversation export: save chat history for audit trails or further analysis.
- Multi-modal responses: combine with image generation or charting for visual answers.
- Integration with other LM-Kit features:
- Chain with Text Analysis for entity extraction from answers.
- Use Structured Extraction to pull specific data fields.
- Connect to Function Calling for automated document workflows.