Chat with PDF Documents

PdfChat is a high-level API that turns PDF documents into interactive Q&A sessions. It handles parsing, chunking, embedding, retrieval, and response generation behind a single class. You load a PDF, ask questions in natural language, and get answers grounded in the document content, with source references pointing to specific pages.

This tutorial builds a working PDF Q&A system from scratch: loading models, indexing a document, running an interactive chat loop, and configuring retrieval quality.

Why PDF Chat Matters

Two enterprise problems that local PDF Q&A solves:

Legal document review. Attorneys and compliance teams review contracts, regulatory filings, and court documents that contain sensitive client information. A local PDF Q&A system lets them query hundreds of pages instantly without uploading confidential material to third-party services.
Technical manual Q&A. Engineers, field technicians, and support staff need quick answers from equipment manuals, safety data sheets, and installation guides. A local system runs offline on a laptop, delivering precise answers with page references even in disconnected environments.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
RAM	16 GB recommended
VRAM	6 GB (for both embedding and chat models)
Disk	~4 GB free for model downloads
PDF files	At least one `.pdf` file to test with

Step 1: Create the Project

dotnet new console -n PdfChatQuickstart
cd PdfChatQuickstart
dotnet add package LM-Kit.NET

Step 2: Understand PdfChat Architecture

PdfChat wraps document parsing, chunking, embedding, retrieval, and chat generation into a single API. Under the hood, it uses DocumentRag for vector search and MultiTurnConversation for contextual responses.

                 ┌─────────────────────────────────────────────┐
                 │                  PdfChat                    │
                 │                                             │
  PDF files ───► │  LoadDocument()                             │
                 │      │                                      │
                 │      ▼                                      │
                 │  Parse ► Chunk ► Embed ► Store              │
                 │                            │                │
  User query ──► │  Submit()                  │                │
                 │      │                     │                │
                 │      ▼                     ▼                │
                 │  Embed query ► Similarity Search            │
                 │                     │                       │
                 │                     ▼                       │
                 │              Top-K passages                 │
                 │                     │                       │
                 │                     ▼                       │
                 │     Inject into prompt + Chat history       │
                 │                     │                       │
                 │                     ▼                       │
                 │              Generate answer                │
                 │              + source refs                  │
                 └─────────────────────────────────────────────┘

Key advantage: PdfChat maintains conversation history automatically, so follow-up questions reference prior answers without any extra code.

Step 3: Write the Program

This program loads two models (embedding and chat), indexes a PDF, and starts an interactive Q&A loop with token streaming.

using System.Text;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Create PdfChat and configure it
// ──────────────────────────────────────
using var pdfChat = new PdfChat(chatModel, embeddingModel)
{
    MaximumCompletionTokens = 1024,
    MaxRetrievedPassages = 5,
    MinRelevanceScore = 0.25f
};

// Stream tokens as they are generated
pdfChat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// ──────────────────────────────────────
// 3. Load a PDF document
// ──────────────────────────────────────
string pdfPath = args.Length > 0 ? args[0] : "document.pdf";

if (!File.Exists(pdfPath))
{
    Console.WriteLine($"File not found: {pdfPath}");
    Console.WriteLine("Usage: dotnet run -- <path-to-pdf>");
    return;
}

Console.WriteLine($"Indexing {Path.GetFileName(pdfPath)}...");

pdfChat.DocumentImportProgress += (_, e) =>
{
    int percent = (int)((e.PageIndex + 1) / (float)e.TotalPages * 100);
    Console.Write($"\r  Processing: page {e.PageIndex + 1}/{e.TotalPages} ({percent}%)   ");
};

var indexResult = await pdfChat.LoadDocumentAsync(pdfPath);
Console.WriteLine($"\n  Indexed {indexResult.PageCount} pages ({indexResult.TokenCount} tokens).\n");

// ──────────────────────────────────────
// 4. Interactive Q&A loop
// ──────────────────────────────────────
Console.WriteLine("Ask questions about the document (or 'quit' to exit):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("Question: ");
    Console.ResetColor();

    string? question = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(question) || question.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write("Answer: ");
    Console.ResetColor();

    var result = await pdfChat.SubmitAsync(question);

    // Show source references
    if (result.HasSourceReferences)
    {
        Console.ForegroundColor = ConsoleColor.DarkGray;
        Console.WriteLine("\n\n  Sources:");
        foreach (var source in result.SourceReferences)
            Console.WriteLine($"    p.{source.PageNumber}: {Truncate(source.Excerpt, 80)}");
        Console.ResetColor();
    }

    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.WriteLine($"\n  [{result.Response.GeneratedTokenCount} tokens, {result.Response.TokenGenerationRate:F1} tok/s]\n");
    Console.ResetColor();
}

// ──────────────────────────────────────
// Helper methods
// ──────────────────────────────────────
static bool DownloadProgress(string path, long? contentLength, long bytesRead)
{
    if (contentLength.HasValue)
        Console.Write($"\r  Downloading: {(double)bytesRead / contentLength.Value * 100:F1}%   ");
    return true;
}

static bool LoadProgress(float progress)
{
    Console.Write($"\r  Loading: {progress * 100:F0}%   ");
    return true;
}

static string Truncate(string text, int maxLength)
{
    if (string.IsNullOrEmpty(text)) return "";
    string cleaned = text.Replace("\n", " ").Replace("\r", "");
    return cleaned.Length <= maxLength ? cleaned : cleaned.Substring(0, maxLength) + "...";
}

Run it:

dotnet run -- "path/to/your/document.pdf"

Step 4: Example Output

Loading embedding model...
  Loading: 100%  Done.

Loading chat model...
  Loading: 100%  Done.

Indexing quarterly-report.pdf...
  Processing: page 32/32 (100%)
  Indexed 32 pages (21,847 tokens).

Ask questions about the document (or 'quit' to exit):

Question: What was the company's total revenue last year?
Answer: According to the financial statements, total revenue for fiscal year 2024
was $2.47 billion, representing a 12% increase year-over-year. The growth was
primarily driven by the cloud services division, which contributed $1.1 billion.

  Sources:
    p.12: Total revenue for the fiscal year ended December 31, 2024 was $2,470...
    p.15: Cloud services revenue grew 23% to $1.1 billion, accounting for 44.5%...

  [87 tokens, 42.3 tok/s]

Question: How does that compare to the previous year?
Answer: In fiscal year 2023, total revenue was $2.21 billion. The year-over-year
increase of $260 million (12%) exceeded the company's guidance of 8-10% growth.
The largest contributor was cloud services, which grew from $894 million to
$1.1 billion.

  Sources:
    p.12: ...compared to $2,205 million in the prior year, representing growth...
    p.8: Management guidance for FY2024 projected revenue growth of 8-10%...

  [92 tokens, 41.8 tok/s]

Notice that the second question ("How does that compare") works correctly because PdfChat maintains conversation history. The model understands "that" refers to the revenue discussed in the previous turn.

Configuration Options

Passage Count and Relevance Threshold

// More passages = more context, but slower and uses more tokens
pdfChat.MaxRetrievedPassages = 10;

// Lower threshold = more results (higher recall, lower precision)
// Higher threshold = fewer, more relevant results
pdfChat.MinRelevanceScore = 0.3f;

Reranking

A reranker re-scores retrieved passages using a cross-encoder for better ranking accuracy:

pdfChat.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.7f);
// 0.0 = only original similarity score
// 1.0 = only reranker score
// 0.7 = blend favoring reranker (recommended)

Full Document Context vs. Passage Retrieval

For small documents (under ~50 pages), PdfChat can inject the entire document into the prompt context instead of doing passage retrieval:

// Enable full document context for small documents
pdfChat.PreferFullDocumentContext = true;
pdfChat.FullDocumentTokenBudget = 8000;  // max tokens to allocate for document content

This gives the model complete document visibility at the cost of higher token usage. For large documents, passage retrieval is more efficient and more accurate.

Token Streaming with Events

Token streaming is enabled by subscribing to the AfterTextCompletion event before calling SubmitAsync:

using System.Text;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Create PdfChat and configure it
// ──────────────────────────────────────
using var pdfChat = new PdfChat(chatModel, embeddingModel)
{
    MaximumCompletionTokens = 1024,
    MaxRetrievedPassages = 5,
    MinRelevanceScore = 0.25f
};

pdfChat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

This writes each token to the console as soon as it is generated, giving the user a responsive experience instead of waiting for the full answer.

Custom System Prompt

Override the default system prompt to control answer style and behavior:

using System.Text;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Create PdfChat and configure it
// ──────────────────────────────────────
using var pdfChat = new PdfChat(chatModel, embeddingModel)
{
    MaximumCompletionTokens = 1024,
    MaxRetrievedPassages = 5,
    MinRelevanceScore = 0.25f
};

pdfChat.SystemPrompt =
    "You are a document analyst. Answer questions using only the information " +
    "found in the loaded documents. If a question cannot be answered from the " +
    "documents, say: 'This information is not in the loaded documents.' " +
    "Always cite the page number when referencing specific information.";

Loading Multiple Documents

PdfChat supports loading multiple PDFs. Each document is indexed independently and searched together during queries.

using System.Text;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Create PdfChat and configure it
// ──────────────────────────────────────
using var pdfChat = new PdfChat(chatModel, embeddingModel)
{
    MaximumCompletionTokens = 1024,
    MaxRetrievedPassages = 5,
    MinRelevanceScore = 0.25f
};

string[] pdfPaths = {
    "reports/annual-report-2024.pdf",
    "reports/quarterly-earnings-q4.pdf",
    "policies/employee-handbook.pdf"
};

foreach (string path in pdfPaths)
{
    if (!File.Exists(path))
    {
        Console.WriteLine($"  Skipping {path} (not found)");
        continue;
    }

    Console.Write($"  Indexing {Path.GetFileName(path)}...");
    var result = await pdfChat.LoadDocumentAsync(path);
    Console.WriteLine($" {result.PageCount} pages indexed.");
}

Console.WriteLine($"\nTotal documents loaded: {pdfChat.DocumentCount}");

When querying across multiple documents, source references indicate which document each passage came from via the Name property on DocumentReference.

Processing Scanned PDFs

For scanned PDFs (image-based, no text layer), configure vision or OCR processing:

using LMKit.Extraction.Ocr;
using LMKit.Integrations.Tesseract;

// Option A: Use a Vision Language Model for layout-aware understanding
pdfChat.DocumentVisionParser = new VlmOcr(visionModel);

// Option B: Use OCR to extract text first, then process normally
pdfChat.OcrEngine = new TesseractOcr();

Vision mode works best when you load a VLM as the vision model (e.g., gemma3:4b). OCR mode works with any text model.

Model Selection

Embedding Models

Model ID	Size	Best For
`embeddinggemma-300m`	~300 MB	General-purpose, fast, low memory (default)
`nomic-embed-text`	~260 MB	High-quality text embeddings

Chat Models

Model ID	VRAM	Best For
`gemma3:4b`	~3.5 GB	Good quality, fast responses
`qwen3:4b`	~3.5 GB	Strong reasoning, multilingual
`gemma3:12b`	~8 GB	High accuracy on complex questions
`qwen3:8b`	~6 GB	Best balance for document analysis

For document Q&A specifically, qwen3:8b or gemma3:12b deliver noticeably better accuracy on complex multi-hop questions (questions that require synthesizing information from multiple sections). Use gemma3:4b if VRAM is limited.

Common Issues

Problem	Cause	Fix
"No relevant passages found"	Relevance threshold too high, or document not text-searchable	Lower `MinRelevanceScore` to 0.15; check if PDF is scanned (use OCR)
Answer ignores document content	System prompt not directive enough	Use a system prompt that explicitly says "answer ONLY from the documents"
Slow indexing on large PDFs	Many pages being embedded sequentially	Normal for 100+ page documents. Index once; subsequent queries are fast
Out of memory loading two models	Combined model size exceeds VRAM	Use `embeddinggemma-300m` (small) + `gemma3:4b` (medium), or reduce `GpuLayerCount`
Garbled text from scanned PDF	PDF has no text layer	Set `DocumentVisionParser` to a `VlmOcr` instance, or set `OcrEngine` to a `TesseractOcr` instance
Follow-up questions lose context	Using `SingleTurnConversation` instead of `PdfChat`	`PdfChat` handles multi-turn automatically. Do not replace it with a single-turn approach

Next Steps

Build a Private Document Q&A System: lower-level document Q&A with more granular control.
Build a RAG Pipeline Over Your Own Documents: custom RAG with RagEngine for text files and other data sources.
Automatically Split Multi-Document PDFs with AI Vision: separate multi-document PDFs into individual documents before processing.
Build Semantic Search with Embeddings: embedding fundamentals and similarity computation.
Extract Structured Data from Unstructured Text: pull typed fields from your documents.

Table of Contents