👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/rag-and-knowledge/rag-chat/conversational_rag

Conversational RAG for C# .NET Applications

🎯 Purpose of the Demo

The Conversational RAG demo shows how to use the RagChat class from LM-Kit.NET to build a multi-turn Q&A chatbot grounded in a custom knowledge base. Unlike a single-shot question answering system, RagChat maintains conversation history across turns, automatically rewrites follow-up questions into self-contained retrieval queries, and supports four distinct query generation strategies (Original, Contextual, Multi-Query, and HyDE).

The knowledge base is entirely fictional (a made-up company called "NovaPulse Technologies"), so every correct answer must come from retrieval. This makes it easy to verify that the RAG pipeline is working: if the model answers with specific NovaPulse facts, the retrieval pipeline found the right context.

👥 Who Should Use This Demo

This demo is ideal for developers and teams who need to:

Build internal Q&A systems: answer employee questions from HR handbooks, product documentation, or knowledge bases.
Create customer support bots: ground chatbot responses in verified company information instead of relying on model memory.
Prototype multi-turn RAG: experiment with contextual query rewriting, Multi-Query retrieval, and HyDE before integrating into production.
Evaluate retrieval strategies: compare how different QueryGenerationMode options affect answer quality and relevance.

🚀 What Problem It Solves

Standard chat completions hallucinate when asked about private or recent information that is not part of the model's training data. A single-turn RAG pipeline solves this for isolated questions, but multi-turn conversations introduce a new challenge: follow-up questions like "What about the cheapest one?" lose their meaning without the prior context.

RagChat solves both problems in a single class:

Grounding: every response is backed by retrieved text partitions from your knowledge base.
Contextual rewriting: follow-up questions are automatically reformulated into standalone queries using conversation history, so retrieval stays accurate across turns.
Switchable strategies: choose between Original, Contextual, Multi-Query (with Reciprocal Rank Fusion), and HyDE at runtime.

💻 Demo Application Overview

The demo is a console application that walks through the full conversational RAG pipeline:

Select a chat model from a menu (or provide a custom model URI / model ID)
Load a small embedding model (embeddinggemma-300m; swap in harrier-oss:0.6b for multilingual corpora) for vector search
Index five fictional knowledge articles into a RagEngine
Start a multi-turn conversation powered by RagChat

✨ Key Features

Model selection menu: choose from Qwen 3.5 9B, Gemma 4 E4B, Phi-4 14.7B, GPT OSS 20B, Qwen 3.6 27B, GLM 4.7 Flash, or enter a custom model
Automatic model download: models are fetched on first run with progress feedback
In-memory knowledge base: five fictional topics indexed with configurable chunk sizes
Four query generation modes: switch at runtime via the /mode command
Retrieval telemetry: every query displays partition counts, similarity scores per topic, and retrieval timing
Pipeline status feedback: console messages show each phase (rewriting, retrieval, generation) so you always know what the system is doing
Interactive commands: /reset, /mode, /topk, /stats, /help

Example Output

  Rewriting query and retrieving relevant context...
    Retrieved 3/3 partitions in 142ms
        Product Catalog: 2 partition(s), best score 0.847
        Company Overview: 1 partition(s), best score 0.612
  Generating response...

  Assistant: NovaPulse offers three navigation modules built on their
  Quantum Pulse Positioning (QPP) platform: the QNav-100 for small
  satellites (CHF 38,000), the QNav-500 for commercial missions
  (CHF 145,000), and the QNav-900X for defense and deep-space
  applications (~CHF 400,000+)...

  [3 partitions | 2048 ctx tokens | quality: 0.92]

🏗️ Architecture

┌───────-──────┐     ┌─────────────────────┐     ┌──────────────┐
│  User Input  │────>│  Query Rewriting    │────>│  Retrieval   │
│  (Console)   │     │  (Contextual/HyDE/  │     │  (RagEngine) │
│              │     │   Multi-Query)      │     │              │
└─────────-────┘     └─────────────────────┘     └──────┬───────┘
                                                        │
                    ┌─────────────────-────┐            │
                    │  Prompt Builder      │<───────────┘
                    │  (query + partitions │
                    │   + history)         │
                    └──────────┬───────-───┘
                               │
                    ┌──────────▼──────────┐
                    │  LLM Generation     │
                    │  (streaming tokens) │
                    └─────────────────────┘

⚙️ Getting Started

Prerequisites

.NET 8.0 or later
Minimum 7 GB VRAM (Qwen 3.5 9B, default selection)

Download

.NET Console Demo

Run

Clone the repository:

git clone https://github.com/LM-Kit/lm-kit-net-samples

Navigate to the project directory:

cd lm-kit-net-samples/console_net/rag-and-knowledge/rag-chat/conversational_rag

Build and run:
```
dotnet build
dotnet run
```
Follow the on-screen prompts to select a model and start asking questions about NovaPulse Technologies.

🔧 Troubleshooting

Symptom	Cause	Fix
Out of memory	Model too large for GPU	Select a smaller model (option 0, Qwen 3.5 9B)
Slow first response	Model downloading	Wait for download to finish (progress is displayed)
Irrelevant answers	Low similarity scores	Increase `MaxRetrievedPartitions` via `/topk` or try `/mode` Multi-Query

🚀 Extend the Demo

Import real documents: replace GetSampleKnowledge() with ragEngine.ImportDocument() to ingest PDFs, DOCX, or web pages
Try different modes: use /mode to compare Original, Contextual, Multi-Query, and HyDE retrieval
Tune retrieval: adjust MaxRetrievedPartitions, MinRelevanceScore, and MmrLambda for your domain
Add a system prompt: set chat.SystemPrompt to customize the assistant's personality or domain focus
Persist the knowledge base: use ragEngine.Save() and RagEngine.Load() to avoid re-indexing on every run

📚 Additional Resources

How-To: Build Conversational RAG with RagChat: Step-by-step guide to building multi-turn RAG with automatic query rewriting.
How-To: Improve Recall with Multi-Query and HyDE: Learn how to use advanced query generation strategies for better retrieval.
Glossary: Query Contextualization: Explains how follow-up questions are rewritten into standalone queries.
Glossary: HyDE: Covers Hypothetical Document Embeddings, one of the retrieval strategies available in RagChat.
Retrieval Quality Tuning Demo: Compare Vector, BM25, and Hybrid search strategies to optimize RAG results.

Table of Contents