Table of Contents

How Do I Build a Private Document Q&A System?


TL;DR

LM-Kit.NET provides two approaches: PdfChat for quick setup with automatic size-based routing (full-document for small files, passage retrieval for large ones), and RagEngine for fully customizable RAG pipelines. Both run entirely locally with no cloud dependency. Supported formats include PDF, DOCX, XLSX, PPTX, HTML, EML, MBOX, and images (via OCR). All processing stays on your machine.


Approach 1: PdfChat (Quick Setup)

PdfChat is a ready-to-use conversational Q&A class that handles document loading, chunking, retrieval, and multi-turn chat automatically:

using LMKit.Model;
using LMKit.Retrieval;

using LM chatModel = LM.LoadFromModelID("qwen3.5:9b");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");

var pdfChat = new PdfChat(chatModel, embeddingModel);

// Load one or more documents
pdfChat.LoadDocument("quarterly-report.pdf");
pdfChat.LoadDocument("product-manual.docx");

// Ask questions (multi-turn, conversational)
string answer1 = pdfChat.Submit("What was the Q3 revenue?");
string answer2 = pdfChat.Submit("How does that compare to Q2?");

// Source references from the last answer
foreach (var source in pdfChat.LastSourceReferences)
    Console.WriteLine($"  Source: {source.DocumentName}, Page {source.PageNumber}");

Automatic Size-Based Routing

PdfChat automatically chooses the best strategy per document:

Document Size Strategy How It Works
Under 4096 tokens (default) Full-document context Entire document included in the prompt
Over 4096 tokens Passage retrieval Only relevant passages retrieved via embedding search

The threshold is configurable:

pdfChat.FullDocumentTokenBudget = 8192;  // Raise for larger full-doc inclusion

Query Generation Modes

For multi-turn conversations, PdfChat can reformulate follow-up questions:

Mode Behavior
Original (default) Use the question as-is
Contextual Reformulate using conversation history ("How about Q2?" → "What was the Q2 revenue?")
MultiQuery Generate multiple query variants for broader retrieval
HypotheticalAnswer Generate a hypothetical answer first, then retrieve (HyDE)

Approach 2: RagEngine (Full Control)

For custom RAG pipelines with fine-grained control over chunking, retrieval, and reranking:

using LMKit.Retrieval;
using LMKit.Embeddings;

using LM embeddingModel = LM.LoadFromModelID("bge-m3");
var ragEngine = new RagEngine(embeddingModel);

// Import documents with automatic chunking
ragEngine.ImportDocument("knowledge-base/manual.pdf");
ragEngine.ImportDocument("knowledge-base/faq.html");

// Customize chunking
ragEngine.DefaultChunking = new TextChunking
{
    MaxChunkSize = 300,    // Tokens per chunk
    MaxOverlapSize = 50    // Overlap for context preservation
};

// Optional: Add a reranker for higher precision
using LM rerankerModel = LM.LoadFromModelID("bge-m3-reranker");
ragEngine.Reranker = new Reranker(rerankerModel);

// Query with a conversation
var chat = new MultiTurnConversation(chatModel);
string answer = ragEngine.QueryWithContext(chat, "What are the safety requirements?");

Supported Document Formats

Format Extension Notes
PDF .pdf Native text extraction + optional OCR for scanned pages
Word .docx Full text and table extraction
Excel .xlsx Cell-level data extraction
PowerPoint .pptx Slide text extraction
HTML .html Structure-aware parsing
Email .eml Headers, body, and attachments
Mail archive .mbox Multi-message archive processing
Images .png, .jpg, .tiff, etc. Via VLM OCR or LM-Kit OCR

Privacy: Everything Stays Local

Both PdfChat and RagEngine run entirely on your machine:

  • No cloud calls. Documents are processed locally.
  • No data leaves your network. Embeddings are computed locally.
  • No API keys required. Models run via local inference.

This makes LM-Kit.NET ideal for sensitive documents: legal contracts, medical records, financial reports, and proprietary code.


Share