Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/rag/conversational_rag

Conversational RAG for C# .NET Applications


🎯 Purpose of the Demo

The Conversational RAG demo shows how to use the RagChat class from LM-Kit.NET to build a multi-turn Q&A chatbot grounded in a custom knowledge base. Unlike a single-shot question answering system, RagChat maintains conversation history across turns, automatically rewrites follow-up questions into self-contained retrieval queries, and supports four distinct query generation strategies (Original, Contextual, Multi-Query, and HyDE).

The knowledge base is entirely fictional (a made-up company called "NovaPulse Technologies"), so every correct answer must come from retrieval. This makes it easy to verify that the RAG pipeline is working: if the model answers with specific NovaPulse facts, the retrieval pipeline found the right context.


👥 Who Should Use This Demo

This demo is ideal for developers and teams who need to:

  • Build internal Q&A systems: answer employee questions from HR handbooks, product documentation, or knowledge bases.
  • Create customer support bots: ground chatbot responses in verified company information instead of relying on model memory.
  • Prototype multi-turn RAG: experiment with contextual query rewriting, Multi-Query retrieval, and HyDE before integrating into production.
  • Evaluate retrieval strategies: compare how different QueryGenerationMode options affect answer quality and relevance.

🚀 What Problem It Solves

Standard chat completions hallucinate when asked about private or recent information that is not part of the model's training data. A single-turn RAG pipeline solves this for isolated questions, but multi-turn conversations introduce a new challenge: follow-up questions like "What about the cheapest one?" lose their meaning without the prior context.

RagChat solves both problems in a single class:

  1. Grounding: every response is backed by retrieved text partitions from your knowledge base.
  2. Contextual rewriting: follow-up questions are automatically reformulated into standalone queries using conversation history, so retrieval stays accurate across turns.
  3. Switchable strategies: choose between Original, Contextual, Multi-Query (with Reciprocal Rank Fusion), and HyDE at runtime.

💻 Demo Application Overview

The demo is a console application that walks through the full conversational RAG pipeline:

  1. Select a chat model from a menu (or provide a custom model URI / model ID)
  2. Load a small embedding model (embeddinggemma-300m) for vector search
  3. Index five fictional knowledge articles into a RagEngine
  4. Start a multi-turn conversation powered by RagChat

✨ Key Features

  • Model selection menu: choose from Qwen-3 8B, Gemma 3 12B, Phi-4 14.7B, GPT OSS 20B, GLM 4.7 Flash, or enter a custom model
  • Automatic model download: models are fetched on first run with progress feedback
  • In-memory knowledge base: five fictional topics indexed with configurable chunk sizes
  • Four query generation modes: switch at runtime via the /mode command
  • Retrieval telemetry: every query displays partition counts, similarity scores per topic, and retrieval timing
  • Pipeline status feedback: console messages show each phase (rewriting, retrieval, generation) so you always know what the system is doing
  • Interactive commands: /reset, /mode, /topk, /stats, /help

Example Output

  Rewriting query and retrieving relevant context...
    Retrieved 3/3 partitions in 142ms
        Product Catalog: 2 partition(s), best score 0.847
        Company Overview: 1 partition(s), best score 0.612
  Generating response...

  Assistant: NovaPulse offers three navigation modules built on their
  Quantum Pulse Positioning (QPP) platform: the QNav-100 for small
  satellites (CHF 38,000), the QNav-500 for commercial missions
  (CHF 145,000), and the QNav-900X for defense and deep-space
  applications (~CHF 400,000+)...

  [3 partitions | 2048 ctx tokens | quality: 0.92]

🏗️ Architecture

┌───────-──────┐     ┌─────────────────────┐     ┌──────────────┐
│  User Input  │────>│  Query Rewriting    │────>│  Retrieval   │
│  (Console)   │     │  (Contextual/HyDE/  │     │  (RagEngine) │
│              │     │   Multi-Query)      │     │              │
└─────────-────┘     └─────────────────────┘     └──────┬───────┘
                                                        │
                    ┌─────────────────-────┐            │
                    │  Prompt Builder      │<───────────┘
                    │  (query + partitions │
                    │   + history)         │
                    └──────────┬───────-───┘
                               │
                    ┌──────────▼──────────┐
                    │  LLM Generation     │
                    │  (streaming tokens) │
                    └─────────────────────┘

⚙️ Getting Started

Prerequisites

  • .NET 8.0 or later
  • Minimum 6 GB VRAM (Qwen-3 8B, default selection)

Download

Run

  1. Clone the repository:

    git clone https://github.com/LM-Kit/lm-kit-net-samples
    
  2. Navigate to the project directory:

    cd lm-kit-net-samples/console_net/rag/conversational_rag
    
  3. Build and run:

    dotnet build
    dotnet run
    
  4. Follow the on-screen prompts to select a model and start asking questions about NovaPulse Technologies.


🔧 Troubleshooting

Symptom Cause Fix
Out of memory Model too large for GPU Select a smaller model (option 0, Qwen-3 8B)
Slow first response Model downloading Wait for download to finish (progress is displayed)
Irrelevant answers Low similarity scores Increase MaxRetrievedPartitions via /topk or try /mode Multi-Query

🚀 Extend the Demo

  • Import real documents: replace GetSampleKnowledge() with ragEngine.ImportDocument() to ingest PDFs, DOCX, or web pages
  • Try different modes: use /mode to compare Original, Contextual, Multi-Query, and HyDE retrieval
  • Tune retrieval: adjust MaxRetrievedPartitions, MinRelevanceScore, and MmrLambda for your domain
  • Add a system prompt: set chat.SystemPrompt to customize the assistant's personality or domain focus
  • Persist the knowledge base: use ragEngine.Save() and RagEngine.Load() to avoid re-indexing on every run

📚 Additional Resources

Share