👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/rag/retrieval_quality_tuning
RAG Retrieval Quality Tuning for C# .NET Applications
🎯 Purpose of the Sample
The Retrieval Quality Tuning demo shows how to use LM-Kit.NET to compare and fine-tune retrieval strategies in a RAG pipeline. It provides an interactive environment where you run the same query under different configurations (Vector, BM25, Hybrid search, reranking, MMR diversity, context windows) and observe how each setting affects result quality.
This is the go-to demo for answering: "I have RAG working. How do I make it better?"
👥 Industry Target Audience
This demo is particularly useful for developers and organizations working on:
- Enterprise knowledge bases: tune retrieval to handle both keyword-specific queries (error codes, product IDs) and semantic queries (conceptual questions) within the same index.
- Customer support systems: optimize retrieval so that exact error codes match first (BM25), while conceptual troubleshooting questions still surface relevant context (vector search).
- Legal and compliance: use hybrid search and reranking to ensure exhaustive recall across large document collections where missing a relevant clause has real consequences.
- Technical documentation portals: balance precision and recall for API references, tutorials, and troubleshooting guides that contain both structured (keyword-rich) and unstructured (explanatory) content.
🚀 Problem Solved
RAG pipelines have many tuning knobs, but understanding their impact is difficult without experimentation. Developers often default to basic vector search and never explore BM25, hybrid fusion, reranking, or diversity filtering. This demo makes the tradeoffs concrete and visible:
- See that BM25 outperforms vector search on keyword queries (error codes, config keys)
- See that vector search outperforms BM25 on semantic queries (conceptual questions)
- See that hybrid search captures both, and reranking further improves precision
- See that MMR filtering removes near-duplicate results that waste context window tokens
💻 Sample Application Description
The Retrieval Quality Tuning demo is a console application that:
- Loads an embedding model (
embeddinggemma-300m) - Indexes fictional "NebulaDB" documentation (11 topics, ~100 chunks)
- Enters an interactive loop where you type queries and commands
- Displays retrieved partitions with scores, section names, and payload previews
✨ Key Features
- Three retrieval strategies: switch between Vector, BM25, and Hybrid with a single command
- Reranking with alpha blending: re-score results with a second embedding pass, tuning the blend between original and reranked scores
- MMR diversity: control the balance between relevance and diversity to reduce redundant results
- Context window expansion: include neighboring partitions around each match for broader context
- Side-by-side comparison:
/compareruns the same query across all strategies and displays results in a unified view - Fictional knowledge base: ensures retrieval correctness is verifiable (model cannot know NebulaDB from training data)
🤖 Benefits for Agentic Solutions
Adding retrieval quality tuning to autonomous agents provides:
- Adaptive retrieval: agents can switch strategies based on query type (keyword vs. semantic)
- Higher accuracy: hybrid search with reranking delivers consistently better results across diverse query patterns
- Reduced hallucination: MMR diversity filtering ensures context windows contain varied, non-redundant passages
- Lower latency budgets: tuning allows you to achieve better quality with fewer retrieved partitions
🏗️ Architecture
┌──────────────────────────┐
│ User Query │
└────────────┬─────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│ Vector │ │ BM25 │ │ Hybrid │
│ (cosine) │ │ (lexical)│ │ (RRF fusion) │
└────┬─────┘ └────┬─────┘ └──────┬───────┘
│ │ │
└───────────────┼─────────────────┘
▼
┌──────────────────┐
│ Reranking │ (optional)
│ alpha blending │
└────────┬─────────┘
▼
┌──────────────────┐
│ MMR Diversity │ (optional)
│ lambda control │
└────────┬─────────┘
▼
┌──────────────────┐
│ Context Window │ (optional)
│ neighbor expand │
└────────┬─────────┘
▼
┌──────────────────┐
│ Ranked Results │
└──────────────────┘
🛠️ Getting Started
📋 Prerequisites
- .NET 8.0 or later
- Minimum 2 GB VRAM (embedding model only)
📥 Download the Project
▶️ Running the Application
Clone the repository:
git clone https://github.com/LM-Kit/lm-kit-net-samplesNavigate to the project directory:
cd lm-kit-net-samples/console_net/rag/retrieval_quality_tuningBuild and run the application:
dotnet build dotnet runFollow the on-screen prompts to enter queries and use commands to switch strategies.
🔧 Troubleshooting
| Issue | Solution |
|---|---|
| Low scores on all strategies | Lower the min score with /minscore 0.1 |
| Too many similar results | Enable MMR diversity with /mmr 0.7 |
| BM25 returns no results | BM25 requires keyword overlap; try more specific terms |
| Slow retrieval | Reduce /topk or disable reranking |
🚀 Extend the Demo
- Add your own documents: replace the fictional NebulaDB content with real documentation using
RagEngine.ImportTextFromFile() - Try MarkdownChunking: switch from
TextChunkingtoMarkdownChunkingfor Markdown-formatted content - Add a chat model: combine retrieval with
RagEngine.QueryPartitions()to generate grounded answers - Automate strategy selection: use query classification to route keyword queries to BM25 and semantic queries to vector search
📚 Additional Resources
- LM-Kit.NET RAG Documentation
- Conversational RAG Demo: multi-turn RAG with query generation modes
- Single-Turn RAG Demo: basic RAG pipeline with file storage
- Single-Turn RAG with Qdrant Demo: enterprise RAG with external vector database