Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/rag/retrieval_quality_tuning

RAG Retrieval Quality Tuning for C# .NET Applications


🎯 Purpose of the Sample

The Retrieval Quality Tuning demo shows how to use LM-Kit.NET to compare and fine-tune retrieval strategies in a RAG pipeline. It provides an interactive environment where you run the same query under different configurations (Vector, BM25, Hybrid search, reranking, MMR diversity, context windows) and observe how each setting affects result quality.

This is the go-to demo for answering: "I have RAG working. How do I make it better?"


👥 Industry Target Audience

This demo is particularly useful for developers and organizations working on:

  • Enterprise knowledge bases: tune retrieval to handle both keyword-specific queries (error codes, product IDs) and semantic queries (conceptual questions) within the same index.
  • Customer support systems: optimize retrieval so that exact error codes match first (BM25), while conceptual troubleshooting questions still surface relevant context (vector search).
  • Legal and compliance: use hybrid search and reranking to ensure exhaustive recall across large document collections where missing a relevant clause has real consequences.
  • Technical documentation portals: balance precision and recall for API references, tutorials, and troubleshooting guides that contain both structured (keyword-rich) and unstructured (explanatory) content.

🚀 Problem Solved

RAG pipelines have many tuning knobs, but understanding their impact is difficult without experimentation. Developers often default to basic vector search and never explore BM25, hybrid fusion, reranking, or diversity filtering. This demo makes the tradeoffs concrete and visible:

  • See that BM25 outperforms vector search on keyword queries (error codes, config keys)
  • See that vector search outperforms BM25 on semantic queries (conceptual questions)
  • See that hybrid search captures both, and reranking further improves precision
  • See that MMR filtering removes near-duplicate results that waste context window tokens

💻 Sample Application Description

The Retrieval Quality Tuning demo is a console application that:

  • Loads an embedding model (embeddinggemma-300m)
  • Indexes fictional "NebulaDB" documentation (11 topics, ~100 chunks)
  • Enters an interactive loop where you type queries and commands
  • Displays retrieved partitions with scores, section names, and payload previews

✨ Key Features

  • Three retrieval strategies: switch between Vector, BM25, and Hybrid with a single command
  • Reranking with alpha blending: re-score results with a second embedding pass, tuning the blend between original and reranked scores
  • MMR diversity: control the balance between relevance and diversity to reduce redundant results
  • Context window expansion: include neighboring partitions around each match for broader context
  • Side-by-side comparison: /compare runs the same query across all strategies and displays results in a unified view
  • Fictional knowledge base: ensures retrieval correctness is verifiable (model cannot know NebulaDB from training data)

🤖 Benefits for Agentic Solutions

Adding retrieval quality tuning to autonomous agents provides:

  • Adaptive retrieval: agents can switch strategies based on query type (keyword vs. semantic)
  • Higher accuracy: hybrid search with reranking delivers consistently better results across diverse query patterns
  • Reduced hallucination: MMR diversity filtering ensures context windows contain varied, non-redundant passages
  • Lower latency budgets: tuning allows you to achieve better quality with fewer retrieved partitions

🏗️ Architecture

                            ┌──────────────────────────┐
                            │     User Query           │
                            └────────────┬─────────────┘
                                         │
                        ┌────────────────┼────────────────┐
                        ▼                ▼                ▼
                  ┌──────────┐    ┌──────────┐    ┌──────────────┐
                  │  Vector  │    │  BM25    │    │   Hybrid     │
                  │ (cosine) │    │ (lexical)│    │ (RRF fusion) │
                  └────┬─────┘    └────┬─────┘    └──────┬───────┘
                       │               │                 │
                       └───────────────┼─────────────────┘
                                       ▼
                              ┌──────────────────┐
                              │   Reranking      │  (optional)
                              │   alpha blending │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │  MMR Diversity   │  (optional)
                              │  lambda control  │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │ Context Window   │  (optional)
                              │ neighbor expand  │
                              └────────┬─────────┘
                                       ▼
                              ┌──────────────────┐
                              │  Ranked Results  │
                              └──────────────────┘

🛠️ Getting Started

📋 Prerequisites

  • .NET 8.0 or later
  • Minimum 2 GB VRAM (embedding model only)

📥 Download the Project

▶️ Running the Application

  1. Clone the repository:

    git clone https://github.com/LM-Kit/lm-kit-net-samples
    
  2. Navigate to the project directory:

    cd lm-kit-net-samples/console_net/rag/retrieval_quality_tuning
    
  3. Build and run the application:

    dotnet build
    dotnet run
    
  4. Follow the on-screen prompts to enter queries and use commands to switch strategies.


🔧 Troubleshooting

Issue Solution
Low scores on all strategies Lower the min score with /minscore 0.1
Too many similar results Enable MMR diversity with /mmr 0.7
BM25 returns no results BM25 requires keyword overlap; try more specific terms
Slow retrieval Reduce /topk or disable reranking

🚀 Extend the Demo

  • Add your own documents: replace the fictional NebulaDB content with real documentation using RagEngine.ImportTextFromFile()
  • Try MarkdownChunking: switch from TextChunking to MarkdownChunking for Markdown-formatted content
  • Add a chat model: combine retrieval with RagEngine.QueryPartitions() to generate grounded answers
  • Automate strategy selection: use query classification to route keyword queries to BM25 and semantic queries to vector search

📚 Additional Resources

Share