Build a RAG Pipeline Over Your Own Documents
Retrieval-Augmented Generation (RAG) grounds LLM responses in your own data. Instead of relying on the model's training data alone, RAG retrieves relevant passages from your documents and injects them into the prompt context. This eliminates hallucinations on domain-specific questions and keeps answers current without retraining.
This tutorial builds a working RAG system that indexes text files, persists the index to disk, and answers questions using retrieved context.
Why Local RAG Matters
Two real-world problems that on-device RAG solves:
- Data sovereignty in regulated industries. Healthcare, finance, and legal organizations cannot send proprietary documents to cloud APIs. Local RAG keeps all data on-premises while still delivering AI-powered Q&A.
- Offline knowledge bases for field workers. Technicians, inspectors, and engineers need access to manuals and procedures in environments with no internet connectivity. Local RAG runs entirely on a laptop or edge device.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| RAM | 16 GB recommended |
| VRAM | 6 GB (for both models simultaneously) |
| Disk | ~4 GB free for model downloads |
You will load two models: an embedding model (for indexing and search) and a chat model (for generating answers).
Step 1: Create the Project
dotnet new console -n RagQuickstart
cd RagQuickstart
dotnet add package LM-Kit.NET
Step 2: Understand the RAG Architecture
┌──────────────┐ chunk + embed ┌────────────────┐
│ Your Docs │ ─────────────────── │ DataSource │
│ (.txt, .md) │ │ (vector index)│
└──────────────┘ └───────┬────────┘
│ similarity search
┌──────────────┐ embed query │
│ User Query │ ───────────────────────────►│
└──────────────┘ │
▼
┌───────────────┐
│ Top-K chunks │
└───────┬───────┘
│ inject into prompt
▼
┌───────────────┐
│ Chat Model │ ──► Answer
└───────────────┘
Key classes:
| Class | Role |
|---|---|
RagEngine |
Orchestrates indexing, search, and LLM querying |
DataSource |
Stores chunk embeddings (in-memory or file-backed) |
TextChunking |
Splits text into overlapping chunks |
Embedder |
Generates vector embeddings |
SingleTurnConversation |
Generates the final answer from retrieved context |
Step 3: Write the Program
using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Create the RAG engine with a file-backed index
// ──────────────────────────────────────
const string IndexPath = "knowledge_base.dat";
DataSource dataSource;
if (File.Exists(IndexPath))
{
Console.WriteLine("Loading existing index from disk...");
dataSource = DataSource.LoadFromFile(IndexPath, readOnly: false);
}
else
{
dataSource = DataSource.CreateFileDataSource(IndexPath, "KnowledgeBase", embeddingModel);
}
var rag = new RagEngine(embeddingModel);
rag.AddDataSource(dataSource);
// Configure chunking
rag.DefaultIChunking = new TextChunking
{
MaxChunkSize = 500, // tokens per chunk
MaxOverlapSize = 50 // overlap for context continuity
};
// ──────────────────────────────────────
// 3. Index documents (skip sections already indexed)
// ──────────────────────────────────────
string[] docs = {
"docs/product-manual.txt",
"docs/faq.txt",
"docs/troubleshooting.txt"
};
foreach (string docPath in docs)
{
string sectionName = Path.GetFileNameWithoutExtension(docPath);
if (dataSource.HasSection(sectionName))
{
Console.WriteLine($" Skipping {sectionName} (already indexed)");
continue;
}
if (!File.Exists(docPath))
{
Console.WriteLine($" Skipping {docPath} (file not found)");
continue;
}
Console.WriteLine($" Indexing {sectionName}...");
string content = File.ReadAllText(docPath);
rag.ImportText(content, "KnowledgeBase", sectionName);
}
Console.WriteLine($"\nIndex contains {dataSource.Sections.Count()} section(s).\n");
// ──────────────────────────────────────
// 4. Query loop
// ──────────────────────────────────────
var chat = new SingleTurnConversation(chatModel)
{
SystemPrompt = "Answer the question using only the provided context. " +
"If the context does not contain the answer, say so.",
MaximumCompletionTokens = 512
};
chat.AfterTextCompletion += (_, e) =>
{
if (e.SegmentType == TextSegmentType.UserVisible)
Console.Write(e.Text);
};
Console.WriteLine("Ask a question about your documents (or 'quit' to exit):\n");
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("Question: ");
Console.ResetColor();
string? query = Console.ReadLine();
if (string.IsNullOrWhiteSpace(query) || query.Equals("quit", StringComparison.OrdinalIgnoreCase))
break;
// Retrieve top-3 most relevant chunks
var matches = rag.FindMatchingPartitions(query, topK: 3, minScore: 0.3f);
if (matches.Count == 0)
{
Console.WriteLine("No relevant passages found in the index.\n");
continue;
}
// Show which sections were matched
Console.ForegroundColor = ConsoleColor.DarkGray;
foreach (var m in matches)
Console.WriteLine($" [{m.SectionIdentifier}] score={m.Similarity:F3}");
Console.ResetColor();
// Generate answer grounded in the retrieved context
Console.ForegroundColor = ConsoleColor.Cyan;
Console.Write("\nAnswer: ");
Console.ResetColor();
var result = rag.QueryPartitions(query, matches, chat);
Console.WriteLine($"\n [{result.GeneratedTokenCount} tokens, {result.TokenGenerationRate:F1} tok/s]\n");
}
// ──────────────────────────────────────
// Helper callbacks
// ──────────────────────────────────────
static bool DownloadProgress(string path, long? contentLength, long bytesRead)
{
if (contentLength.HasValue)
Console.Write($"\r Downloading: {(double)bytesRead / contentLength.Value * 100:F1}% ");
return true;
}
static bool LoadProgress(float progress)
{
Console.Write($"\r Loading: {progress * 100:F0}% ");
return true;
}
Step 4: Create Sample Documents and Run
Create a docs/ folder with a few .txt files containing your content, then:
dotnet run
Example session:
Loading embedding model...
Loading: 100% Done.
Loading chat model...
Loading: 100% Done.
Indexing product-manual...
Indexing faq...
Index contains 2 section(s).
Ask a question about your documents (or 'quit' to exit):
Question: How do I reset the device to factory settings?
[product-manual] score=0.847
[faq] score=0.612
Answer: To reset the device to factory settings, press and hold the power button
and volume-down button simultaneously for 10 seconds until the LED flashes red.
The device will restart and all user data will be erased.
[52 tokens, 38.7 tok/s]
Choosing an Embedding Model
| Model ID | Dimensions | Size | Best For |
|---|---|---|---|
embeddinggemma-300m |
256 | ~300 MB | General-purpose, fast, lowest memory |
qwen3-embedding:0.6b |
1024 | ~600 MB | Higher dimension, better recall for large collections |
nomic-embed-text |
768 | ~260 MB | High-quality text embeddings |
All models are downloaded automatically with LoadFromModelID. Use embeddinggemma-300m as a default starting point. For production workloads with large document collections, qwen3-embedding:0.6b provides stronger recall thanks to its higher dimensionality.
Tuning Retrieval Quality
Chunk Size
| Chunk Size | Effect |
|---|---|
| Small (200-300) | More precise matches, but may split important context |
| Medium (400-500) | Good default balance |
| Large (800-1000) | Better for long-form content, less precise matching |
Search Parameters
using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Create the RAG engine with a file-backed index
// ──────────────────────────────────────
const string IndexPath = "knowledge_base.dat";
DataSource dataSource;
if (File.Exists(IndexPath))
{
Console.WriteLine("Loading existing index from disk...");
dataSource = DataSource.LoadFromFile(IndexPath, readOnly: false);
}
else
{
dataSource = DataSource.CreateFileDataSource(IndexPath, "KnowledgeBase", embeddingModel);
}
var rag = new RagEngine(embeddingModel);
string query = "What are the key findings?";
var matches = rag.FindMatchingPartitions(
query,
topK: 5, // return up to 5 chunks
minScore: 0.3f, // minimum cosine similarity threshold
forceUniqueSection: true // at most one result per section
);
Lowering minScore returns more results (higher recall, lower precision). Raising it returns fewer, more relevant results.
Adding a Reranker
A reranker re-scores retrieved chunks using a cross-encoder, improving ranking quality at a small latency cost:
rag.Reranker = new RagEngine.RagReranker(embeddingModel, rerankedAlpha: 0.7f);
// rerankedAlpha: 0.0 = only original score, 1.0 = only reranker score
Persistence and Incremental Updates
The DataSource.CreateFileDataSource approach persists embeddings to disk. On subsequent runs, DataSource.LoadFromFile loads the index instantly without re-embedding.
To add new documents later:
using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Create the RAG engine with a file-backed index
// ──────────────────────────────────────
const string IndexPath = "knowledge_base.dat";
DataSource dataSource;
if (File.Exists(IndexPath))
{
Console.WriteLine("Loading existing index from disk...");
dataSource = DataSource.LoadFromFile(IndexPath, readOnly: false);
}
else
{
dataSource = DataSource.CreateFileDataSource(IndexPath, "KnowledgeBase", embeddingModel);
}
var rag = new RagEngine(embeddingModel);
rag.AddDataSource(dataSource);
// Configure chunking
rag.DefaultIChunking = new TextChunking
{
MaxChunkSize = 500, // tokens per chunk
MaxOverlapSize = 50 // overlap for context continuity
};
// ──────────────────────────────────────
// 3. Index documents (skip sections already indexed)
// ──────────────────────────────────────
string[] docs = {
"docs/product-manual.txt",
"docs/faq.txt",
"docs/troubleshooting.txt"
};
if (!dataSource.HasSection("new-document"))
{
string content = File.ReadAllText("docs/new-document.txt");
rag.ImportText(content, "KnowledgeBase", "new-document");
}
Scaling Up: PDF and Markdown Documents
For PDF files, use DocumentRag instead of RagEngine for built-in document parsing:
using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Create the RAG engine with a file-backed index
// ──────────────────────────────────────
const string IndexPath = "knowledge_base.dat";
DataSource dataSource;
if (File.Exists(IndexPath))
{
Console.WriteLine("Loading existing index from disk...");
dataSource = DataSource.LoadFromFile(IndexPath, readOnly: false);
}
else
{
dataSource = DataSource.CreateFileDataSource(IndexPath, "KnowledgeBase", embeddingModel);
}
var docRag = new DocumentRag(embeddingModel);
var attachment = new Attachment("report.pdf");
var metadata = new DocumentRag.DocumentMetadata(attachment, id: "q4-report");
await docRag.ImportDocumentAsync(attachment, metadata, "Reports");
var matches = docRag.FindMatchingPartitions("quarterly revenue", topK: 5);
For the highest-level PDF Q&A experience (with chat history and source references), use PdfChat:
using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Create the RAG engine with a file-backed index
// ──────────────────────────────────────
const string IndexPath = "knowledge_base.dat";
DataSource dataSource;
if (File.Exists(IndexPath))
{
Console.WriteLine("Loading existing index from disk...");
dataSource = DataSource.LoadFromFile(IndexPath, readOnly: false);
}
else
{
dataSource = DataSource.CreateFileDataSource(IndexPath, "KnowledgeBase", embeddingModel);
}
var pdfChat = new PdfChat(chatModel, embeddingModel);
await pdfChat.LoadDocumentAsync("report.pdf");
var response = await pdfChat.SubmitAsync("What were the key findings?");
Console.WriteLine(response.Response.Completion);
Custom Prompt Templates
Override how retrieved context is injected into the prompt:
using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.Retrieval;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine(" Done.\n");
// ──────────────────────────────────────
// 2. Create the RAG engine with a file-backed index
// ──────────────────────────────────────
const string IndexPath = "knowledge_base.dat";
DataSource dataSource;
if (File.Exists(IndexPath))
{
Console.WriteLine("Loading existing index from disk...");
dataSource = DataSource.LoadFromFile(IndexPath, readOnly: false);
}
else
{
dataSource = DataSource.CreateFileDataSource(IndexPath, "KnowledgeBase", embeddingModel);
}
var rag = new RagEngine(embeddingModel);
rag.AddDataSource(dataSource);
// Configure chunking
rag.DefaultIChunking = new TextChunking
{
MaxChunkSize = 500, // tokens per chunk
MaxOverlapSize = 50 // overlap for context continuity
};
// ──────────────────────────────────────
// 3. Index documents (skip sections already indexed)
// ──────────────────────────────────────
string[] docs = {
"docs/product-manual.txt",
"docs/faq.txt",
"docs/troubleshooting.txt"
};
foreach (string docPath in docs)
{
string sectionName = Path.GetFileNameWithoutExtension(docPath);
if (dataSource.HasSection(sectionName))
{
Console.WriteLine($" Skipping {sectionName} (already indexed)");
continue;
}
if (!File.Exists(docPath))
{
Console.WriteLine($" Skipping {docPath} (file not found)");
continue;
}
Console.WriteLine($" Indexing {sectionName}...");
string content = File.ReadAllText(docPath);
rag.ImportText(content, "KnowledgeBase", sectionName);
}
Console.WriteLine($"\nIndex contains {dataSource.Sections.Count()} section(s).\n");
// ──────────────────────────────────────
// 4. Query loop
// ──────────────────────────────────────
var chat = new SingleTurnConversation(chatModel)
{
SystemPrompt = "Answer the question using only the provided context. " +
"If the context does not contain the answer, say so.",
MaximumCompletionTokens = 512
};
chat.AfterTextCompletion += (_, e) =>
{
if (e.SegmentType == TextSegmentType.UserVisible)
Console.Write(e.Text);
};
Console.WriteLine("Ask a question about your documents (or 'quit' to exit):\n");
while (true)
{
Console.ForegroundColor = ConsoleColor.Green;
Console.Write("Question: ");
Console.ResetColor();
string? query = Console.ReadLine();
if (string.IsNullOrWhiteSpace(query) || query.Equals("quit", StringComparison.OrdinalIgnoreCase))
break;
// Retrieve top-3 most relevant chunks
var matches = rag.FindMatchingPartitions(query, topK: 3, minScore: 0.3f);
if (matches.Count == 0)
{
Console.WriteLine("No relevant passages found in the index.\n");
continue;
}
// Show which sections were matched
Console.ForegroundColor = ConsoleColor.DarkGray;
foreach (var m in matches)
Console.WriteLine($" [{m.SectionIdentifier}] score={m.Similarity:F3}");
Console.ResetColor();
string customTemplate = @"Use the following reference material to answer the user's question.
If the material does not contain the answer, state that clearly.
## Reference Material:
@context
## User Question:
@question";
Console.ForegroundColor = ConsoleColor.Cyan;
Console.Write("\nAnswer: ");
Console.ResetColor();
var result = rag.QueryPartitions(query, customTemplate, matches, chat);
Console.WriteLine($"\n [{result.GeneratedTokenCount} tokens, {result.TokenGenerationRate:F1} tok/s]\n");
}
// ──────────────────────────────────────
// Helper callbacks
// ──────────────────────────────────────
static bool DownloadProgress(string path, long? contentLength, long bytesRead)
{
if (contentLength.HasValue)
Console.Write($"\r Downloading: {(double)bytesRead / contentLength.Value * 100:F1}% ");
return true;
}
static bool LoadProgress(float progress)
{
Console.Write($"\r Loading: {progress * 100:F0}% ");
return true;
}
The placeholders @context and @question are replaced automatically.
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Low similarity scores | Embedding model not suited to your domain | Try nomic-embed-text or increase chunk overlap |
| Answers ignore retrieved context | System prompt too weak | Strengthen the instruction: "Answer ONLY from the provided context" |
| Index file grows large | Many large documents | Use MarkdownChunking for structured docs, or reduce MaxChunkSize |
| Slow indexing | Large corpus on CPU | Use GPU-accelerated embedding, or batch-index offline |
Next Steps
- Boost Retrieval with Hybrid Search: combine vector and BM25 search for broader recall.
- Build Conversational RAG with RagChat: wrap your pipeline in a multi-turn conversational interface.
- Improve Recall with Multi-Query and HyDE Retrieval: expand queries to capture more relevant passages.
- Diversify and Filter RAG Results: reduce redundancy with MMR and scope retrieval with metadata filtering.
- Improve RAG Results with Reranking: add a cross-encoder reranker to boost retrieval precision.
- Optimize RAG with Custom Chunking Strategies: tailor
TextChunking,MarkdownChunking, orHtmlChunkingto your content. - Build a Unified Multimodal RAG System: index audio, images, and text in one knowledge base.
- Chat with PDF Documents: high-level PDF chat with
PdfChat. - Samples: Conversational RAG: multi-turn RAG with
RagChat, query contextualization, Multi-Query, and HyDE. - Samples: Single-Turn RAG: basic single-turn Q&A demo.