👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/rag-and-knowledge/rag-chat/retrieval_quality_tuning

RAG Retrieval Quality Tuning for C# .NET Applications

🎯 Purpose of the Sample

The Retrieval Quality Tuning demo shows how to use LM-Kit.NET to compare and fine-tune retrieval strategies in a RAG pipeline. It provides an interactive environment where you run the same query under different configurations (Vector, BM25, Hybrid search, reranking, MMR diversity, context windows) and observe how each setting affects result quality.

This is the go-to demo for answering: "I have RAG working. How do I make it better?"

👥 Industry Target Audience

This demo is particularly useful for developers and organizations working on:

Enterprise knowledge bases: tune retrieval to handle both keyword-specific queries (error codes, product IDs) and semantic queries (conceptual questions) within the same index.
Customer support systems: optimize retrieval so that exact error codes match first (BM25), while conceptual troubleshooting questions still surface relevant context (vector search).
Legal and compliance: use hybrid search and reranking to ensure exhaustive recall across large document collections where missing a relevant clause has real consequences.
Technical documentation portals: balance precision and recall for API references, tutorials, and troubleshooting guides that contain both structured (keyword-rich) and unstructured (explanatory) content.

🚀 Problem Solved

RAG pipelines have many tuning knobs, but understanding their impact is difficult without experimentation. Developers often default to basic vector search and never explore BM25, hybrid fusion, reranking, or diversity filtering. This demo makes the tradeoffs concrete and visible:

See that BM25 outperforms vector search on keyword queries (error codes, config keys)
See that vector search outperforms BM25 on semantic queries (conceptual questions)
See that hybrid search captures both, and reranking further improves precision
See that MMR filtering removes near-duplicate results that waste context window tokens

💻 Sample Application Description

The Retrieval Quality Tuning demo is a console application that:

Loads an embedding model (embeddinggemma-300m; swap in harrier-oss:0.6b for multilingual content)
Indexes fictional "NebulaDB" documentation (11 topics, ~100 chunks)
Enters an interactive loop where you type queries and commands
Displays retrieved partitions with scores, section names, and payload previews

✨ Key Features

Three retrieval strategies: switch between Vector, BM25, and Hybrid with a single command
Reranking with alpha blending: re-score results with a second embedding pass, tuning the blend between original and reranked scores
MMR diversity: control the balance between relevance and diversity to reduce redundant results
Context window expansion: include neighboring partitions around each match for broader context
Side-by-side comparison: /compare runs the same query across all strategies and displays results in a unified view
Fictional knowledge base: ensures retrieval correctness is verifiable (model cannot know NebulaDB from training data)

🤖 Benefits for Agentic Solutions

Adding retrieval quality tuning to autonomous agents provides:

Adaptive retrieval: agents can switch strategies based on query type (keyword vs. semantic)
Higher accuracy: hybrid search with reranking delivers consistently better results across diverse query patterns
Reduced hallucination: MMR diversity filtering ensures context windows contain varied, non-redundant passages
Lower latency budgets: tuning allows you to achieve better quality with fewer retrieved partitions

🏗️ Architecture

                            ┌──────────────────────────┐
                            │     User Query           │
                            └────────────┬─────────────┘
                                         │
                        ┌────────────────┼────────────────┐
                        ▼                ▼                ▼
                  ┌──────────┐    ┌──────────┐    ┌──────────────┐
                  │  Vector  │    │  BM25    │    │   Hybrid     │
                  │ (cosine) │    │ (lexical)│    │ (RRF fusion) │
                  └────┬─────┘    └────┬─────┘    └──────┬───────┘
                       │               │                 │
                       └───────────────┼─────────────────┘
                                       ▼
                              ┌──────────────────┐
                              │   Reranking      │  (optional)
                              │   alpha blending │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │  MMR Diversity   │  (optional)
                              │  lambda control  │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │ Context Window   │  (optional)
                              │ neighbor expand  │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │  Ranked Results  │
                              └──────────────────┘

🛠️ Getting Started

📋 Prerequisites

.NET 8.0 or later
Minimum 2 GB VRAM (embedding model only)

📥 Download the Project

.NET Console Demo

▶️ Running the Application

Clone the repository:

git clone https://github.com/LM-Kit/lm-kit-net-samples

Navigate to the project directory:

cd lm-kit-net-samples/console_net/rag-and-knowledge/rag-chat/retrieval_quality_tuning

Build and run the application:
```
dotnet build
dotnet run
```
Follow the on-screen prompts to enter queries and use commands to switch strategies.

🔧 Troubleshooting

Issue	Solution
Low scores on all strategies	Lower the min score with `/minscore 0.1`
Too many similar results	Enable MMR diversity with `/mmr 0.7`
BM25 returns no results	BM25 requires keyword overlap; try more specific terms
Slow retrieval	Reduce `/topk` or disable reranking

🚀 Extend the Demo

Add your own documents: replace the fictional NebulaDB content with real documentation using RagEngine.ImportTextFromFile()
Try MarkdownChunking: switch from TextChunking to MarkdownChunking for Markdown-formatted content
Add a chat model: combine retrieval with RagEngine.QueryPartitions() to generate grounded answers
Automate strategy selection: use query classification to route keyword queries to BM25 and semantic queries to vector search

📚 Additional Resources

LM-Kit.NET RAG Documentation
Conversational RAG Demo: multi-turn RAG with query generation modes
Single-Turn RAG Demo: basic RAG pipeline with file storage
Single-Turn RAG with Qdrant Demo: enterprise RAG with external vector database

How-To: Boost Retrieval with Hybrid Search: Guide to combining vector and BM25 search with Reciprocal Rank Fusion.
How-To: Improve RAG Results with Reranking: Learn how to apply cross-encoder reranking to improve retrieval precision.
How-To: Diversify and Filter RAG Results: Use MMR filtering to reduce redundant results and maximize context diversity.
Glossary: Hybrid Search: Explains combining semantic and keyword search for robust retrieval.
Glossary: Reranking: Covers cross-encoder reranking techniques for improving search result quality.

Table of Contents