👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/rag/conversational_rag
Conversational RAG for C# .NET Applications
🎯 Purpose of the Demo
The Conversational RAG demo shows how to use the RagChat class from LM-Kit.NET to build a multi-turn Q&A chatbot grounded in a custom knowledge base. Unlike a single-shot question answering system, RagChat maintains conversation history across turns, automatically rewrites follow-up questions into self-contained retrieval queries, and supports four distinct query generation strategies (Original, Contextual, Multi-Query, and HyDE).
The knowledge base is entirely fictional (a made-up company called "NovaPulse Technologies"), so every correct answer must come from retrieval. This makes it easy to verify that the RAG pipeline is working: if the model answers with specific NovaPulse facts, the retrieval pipeline found the right context.
👥 Who Should Use This Demo
This demo is ideal for developers and teams who need to:
- Build internal Q&A systems: answer employee questions from HR handbooks, product documentation, or knowledge bases.
- Create customer support bots: ground chatbot responses in verified company information instead of relying on model memory.
- Prototype multi-turn RAG: experiment with contextual query rewriting, Multi-Query retrieval, and HyDE before integrating into production.
- Evaluate retrieval strategies: compare how different
QueryGenerationModeoptions affect answer quality and relevance.
🚀 What Problem It Solves
Standard chat completions hallucinate when asked about private or recent information that is not part of the model's training data. A single-turn RAG pipeline solves this for isolated questions, but multi-turn conversations introduce a new challenge: follow-up questions like "What about the cheapest one?" lose their meaning without the prior context.
RagChat solves both problems in a single class:
- Grounding: every response is backed by retrieved text partitions from your knowledge base.
- Contextual rewriting: follow-up questions are automatically reformulated into standalone queries using conversation history, so retrieval stays accurate across turns.
- Switchable strategies: choose between Original, Contextual, Multi-Query (with Reciprocal Rank Fusion), and HyDE at runtime.
💻 Demo Application Overview
The demo is a console application that walks through the full conversational RAG pipeline:
- Select a chat model from a menu (or provide a custom model URI / model ID)
- Load a small embedding model (
embeddinggemma-300m) for vector search - Index five fictional knowledge articles into a
RagEngine - Start a multi-turn conversation powered by
RagChat
✨ Key Features
- Model selection menu: choose from Qwen-3 8B, Gemma 3 12B, Phi-4 14.7B, GPT OSS 20B, GLM 4.7 Flash, or enter a custom model
- Automatic model download: models are fetched on first run with progress feedback
- In-memory knowledge base: five fictional topics indexed with configurable chunk sizes
- Four query generation modes: switch at runtime via the
/modecommand - Retrieval telemetry: every query displays partition counts, similarity scores per topic, and retrieval timing
- Pipeline status feedback: console messages show each phase (rewriting, retrieval, generation) so you always know what the system is doing
- Interactive commands:
/reset,/mode,/topk,/stats,/help
Example Output
Rewriting query and retrieving relevant context...
Retrieved 3/3 partitions in 142ms
Product Catalog: 2 partition(s), best score 0.847
Company Overview: 1 partition(s), best score 0.612
Generating response...
Assistant: NovaPulse offers three navigation modules built on their
Quantum Pulse Positioning (QPP) platform: the QNav-100 for small
satellites (CHF 38,000), the QNav-500 for commercial missions
(CHF 145,000), and the QNav-900X for defense and deep-space
applications (~CHF 400,000+)...
[3 partitions | 2048 ctx tokens | quality: 0.92]
🏗️ Architecture
┌───────-──────┐ ┌─────────────────────┐ ┌──────────────┐
│ User Input │────>│ Query Rewriting │────>│ Retrieval │
│ (Console) │ │ (Contextual/HyDE/ │ │ (RagEngine) │
│ │ │ Multi-Query) │ │ │
└─────────-────┘ └─────────────────────┘ └──────┬───────┘
│
┌─────────────────-────┐ │
│ Prompt Builder │<───────────┘
│ (query + partitions │
│ + history) │
└──────────┬───────-───┘
│
┌──────────▼──────────┐
│ LLM Generation │
│ (streaming tokens) │
└─────────────────────┘
⚙️ Getting Started
Prerequisites
- .NET 8.0 or later
- Minimum 6 GB VRAM (Qwen-3 8B, default selection)
Download
Run
Clone the repository:
git clone https://github.com/LM-Kit/lm-kit-net-samplesNavigate to the project directory:
cd lm-kit-net-samples/console_net/rag/conversational_ragBuild and run:
dotnet build dotnet runFollow the on-screen prompts to select a model and start asking questions about NovaPulse Technologies.
🔧 Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Out of memory | Model too large for GPU | Select a smaller model (option 0, Qwen-3 8B) |
| Slow first response | Model downloading | Wait for download to finish (progress is displayed) |
| Irrelevant answers | Low similarity scores | Increase MaxRetrievedPartitions via /topk or try /mode Multi-Query |
🚀 Extend the Demo
- Import real documents: replace
GetSampleKnowledge()withragEngine.ImportDocument()to ingest PDFs, DOCX, or web pages - Try different modes: use
/modeto compare Original, Contextual, Multi-Query, and HyDE retrieval - Tune retrieval: adjust
MaxRetrievedPartitions,MinRelevanceScore, andMmrLambdafor your domain - Add a system prompt: set
chat.SystemPromptto customize the assistant's personality or domain focus - Persist the knowledge base: use
ragEngine.Save()andRagEngine.Load()to avoid re-indexing on every run