How Do I Build a Private Document Q&A System?
TL;DR
LM-Kit.NET provides two approaches: PdfChat for quick setup with automatic size-based routing (full-document for small files, passage retrieval for large ones), and RagEngine for fully customizable RAG pipelines. Both run entirely locally with no cloud dependency. Supported formats include PDF, DOCX, XLSX, PPTX, HTML, EML, MBOX, and images (via OCR). All processing stays on your machine.
Approach 1: PdfChat (Quick Setup)
PdfChat is a ready-to-use conversational Q&A class that handles document loading, chunking, retrieval, and multi-turn chat automatically:
using LMKit.Model;
using LMKit.Retrieval;
using LM chatModel = LM.LoadFromModelID("qwen3.5:9b");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
var pdfChat = new PdfChat(chatModel, embeddingModel);
// Load one or more documents
pdfChat.LoadDocument("quarterly-report.pdf");
pdfChat.LoadDocument("product-manual.docx");
// Ask questions (multi-turn, conversational)
string answer1 = pdfChat.Submit("What was the Q3 revenue?");
string answer2 = pdfChat.Submit("How does that compare to Q2?");
// Source references from the last answer
foreach (var source in pdfChat.LastSourceReferences)
Console.WriteLine($" Source: {source.DocumentName}, Page {source.PageNumber}");
Automatic Size-Based Routing
PdfChat automatically chooses the best strategy per document:
| Document Size | Strategy | How It Works |
|---|---|---|
| Under 4096 tokens (default) | Full-document context | Entire document included in the prompt |
| Over 4096 tokens | Passage retrieval | Only relevant passages retrieved via embedding search |
The threshold is configurable:
pdfChat.FullDocumentTokenBudget = 8192; // Raise for larger full-doc inclusion
Query Generation Modes
For multi-turn conversations, PdfChat can reformulate follow-up questions:
| Mode | Behavior |
|---|---|
| Original (default) | Use the question as-is |
| Contextual | Reformulate using conversation history ("How about Q2?" → "What was the Q2 revenue?") |
| MultiQuery | Generate multiple query variants for broader retrieval |
| HypotheticalAnswer | Generate a hypothetical answer first, then retrieve (HyDE) |
Approach 2: RagEngine (Full Control)
For custom RAG pipelines with fine-grained control over chunking, retrieval, and reranking:
using LMKit.Retrieval;
using LMKit.Embeddings;
using LM embeddingModel = LM.LoadFromModelID("bge-m3");
var ragEngine = new RagEngine(embeddingModel);
// Import documents with automatic chunking
ragEngine.ImportDocument("knowledge-base/manual.pdf");
ragEngine.ImportDocument("knowledge-base/faq.html");
// Customize chunking
ragEngine.DefaultChunking = new TextChunking
{
MaxChunkSize = 300, // Tokens per chunk
MaxOverlapSize = 50 // Overlap for context preservation
};
// Optional: Add a reranker for higher precision
using LM rerankerModel = LM.LoadFromModelID("bge-m3-reranker");
ragEngine.Reranker = new Reranker(rerankerModel);
// Query with a conversation
var chat = new MultiTurnConversation(chatModel);
string answer = ragEngine.QueryWithContext(chat, "What are the safety requirements?");
Supported Document Formats
| Format | Extension | Notes |
|---|---|---|
| Native text extraction + optional OCR for scanned pages | ||
| Word | .docx | Full text and table extraction |
| Excel | .xlsx | Cell-level data extraction |
| PowerPoint | .pptx | Slide text extraction |
| HTML | .html | Structure-aware parsing |
| .eml | Headers, body, and attachments | |
| Mail archive | .mbox | Multi-message archive processing |
| Images | .png, .jpg, .tiff, etc. | Via VLM OCR or LM-Kit OCR |
Privacy: Everything Stays Local
Both PdfChat and RagEngine run entirely on your machine:
- No cloud calls. Documents are processed locally.
- No data leaves your network. Embeddings are computed locally.
- No API keys required. Models run via local inference.
This makes LM-Kit.NET ideal for sensitive documents: legal contracts, medical records, financial reports, and proprietary code.
📚 Related Content
- What embedding models does LM-Kit.NET support?: Choose the right embedding model for your Q&A system.
- How do I handle documents larger than the model's context window?: Chunking and overflow strategies.
- What OCR options does LM-Kit.NET provide?: Process scanned documents in your Q&A pipeline.
- Build a RAG Pipeline Over Your Own Documents: Step-by-step RAG implementation guide.