Table of Contents

Build Conversational RAG with RagChat

A basic RAG pipeline answers isolated questions, but real users ask follow-up questions that depend on conversation history. "What were Q4 revenues?" followed by "How does that compare to Q3?" requires the system to understand that "that" refers to Q4 revenues. RagChat wraps RagEngine with built-in conversation management, query contextualization, and configurable query generation modes so that each follow-up retrieves the right passages automatically.

This tutorial builds a multi-turn document Q&A system where follow-up questions are handled transparently.


Why This Matters

Two enterprise problems that conversational RAG solves:

  1. Support agents drilling into complex cases. A support engineer asks about an error, then follows up with "Which versions are affected?" and "What is the workaround?" Each question depends on the previous context. Without conversation-aware retrieval, the system retrieves unrelated passages because the follow-up questions lack sufficient standalone context.
  2. Analysts exploring large document collections. A financial analyst asks "What is the company's debt-to-equity ratio?" then "How has it changed year over year?" The second query is meaningless without the first. RagChat rewrites it into a self-contained query before retrieval.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
RAM 16 GB recommended
VRAM 6 GB (for both models simultaneously)
Disk ~4 GB free for model downloads

You should be familiar with the foundational RAG pipeline before starting this tutorial.


Step 1: Create the Project

dotnet new console -n ConversationalRag
cd ConversationalRag
dotnet add package LM-Kit.NET

Step 2: Understand the Architecture

┌──────────────────────────────────────────────────────────────────┐
│                           RagChat                                │
│                                                                  │
│  ChatHistory ──► QueryGenerationMode ──► RagEngine ──► LLM       │
│                                                                  │
│  Modes:                                                          │
│    Original ............. use query as-is                        │
│    Contextual ........... rewrite with history context           │
│    MultiQuery ........... generate N query variants + RRF        │
│    HypotheticalAnswer ... generate hypothetical answer (HyDE)    │
└──────────────────────────────────────────────────────────────────┘

Key classes:

Class Role
RagChat Multi-turn conversational interface over RagEngine
RagEngine Core retrieval and indexing engine
QueryGenerationMode Controls how queries are transformed before retrieval

Step 3: Build a Conversational RAG System

using System.Text;
using LMKit.Data;
using LMKit.Model;
using LMKit.Retrieval;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load models
// ──────────────────────────────────────
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine(" Done.\n");

// ──────────────────────────────────────
// 2. Build the RAG engine and index documents
// ──────────────────────────────────────
var dataSource = DataSource.CreateInMemoryDataSource("KnowledgeBase", embeddingModel);
var rag = new RagEngine(embeddingModel);
rag.AddDataSource(dataSource);
rag.DefaultIChunking = new TextChunking { MaxChunkSize = 500, MaxOverlapSize = 50 };

// Index some sample content
string[] sections =
{
    "Q4 2024 revenue was $12.3M, a 15% increase over Q3 2024 revenue of $10.7M. " +
    "The growth was driven by enterprise license sales in the EMEA region.",

    "Q3 2024 revenue was $10.7M, primarily from North American cloud subscriptions. " +
    "Operating expenses were $8.1M, resulting in an operating margin of 24.3%.",

    "The company's debt-to-equity ratio improved from 1.8 in Q3 to 1.5 in Q4, " +
    "driven by accelerated debt repayment and retained earnings growth."
};

for (int i = 0; i < sections.Length; i++)
    rag.ImportText(sections[i], "KnowledgeBase", $"financial-report-section-{i}");

Console.WriteLine($"Indexed {sections.Length} sections.\n");

// ──────────────────────────────────────
// 3. Create RagChat with contextual query rewriting
// ──────────────────────────────────────
using var ragChat = new RagChat(rag, chatModel)
{
    QueryGenerationMode = QueryGenerationMode.Contextual,
    MaxRetrievedPartitions = 3,
    MinRelevanceScore = 0.3f,
    SystemPrompt = "Answer the question using only the provided context. " +
                   "If the context does not contain the answer, say so.",
    MaximumCompletionTokens = 512
};

// Stream tokens as they are generated
ragChat.AfterTextCompletion += (_, e) =>
{
    Console.Write(e.Text);
};

// Monitor retrieval
ragChat.RetrievalCompleted += (_, e) =>
{
    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.WriteLine($"  [Retrieved {e.Partitions.Count} passages]");
    Console.ResetColor();
};

// ──────────────────────────────────────
// 4. Multi-turn conversation loop
// ──────────────────────────────────────
Console.WriteLine("Ask questions about the financial reports (or 'quit' to exit):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("You: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write("Assistant: ");
    Console.ResetColor();

    RagQueryResult result = ragChat.Submit(input);
    Console.WriteLine($"\n  [{result.GeneratedTokenCount} tokens, {result.TokenGenerationRate:F1} tok/s]\n");
}

Step 4: Run and Test Multi-Turn Retrieval

dotnet run

Example session showing how contextual mode rewrites follow-up queries:

Indexed 3 sections.

Ask questions about the financial reports (or 'quit' to exit):

You: What was Q4 revenue?
  [Retrieved 2 passages]
Assistant: Q4 2024 revenue was $12.3M, a 15% increase over Q3.
  [38 tokens, 41.2 tok/s]

You: How does that compare to Q3?
  [Retrieved 2 passages]
Assistant: Q4 revenue of $12.3M was 15% higher than Q3 revenue of $10.7M.
The growth was driven by enterprise license sales in EMEA.
  [52 tokens, 39.8 tok/s]

You: What about the debt-to-equity ratio?
  [Retrieved 1 passages]
Assistant: The debt-to-equity ratio improved from 1.8 in Q3 to 1.5 in Q4,
driven by accelerated debt repayment and retained earnings growth.
  [47 tokens, 40.1 tok/s]

Without contextual mode, the query "How does that compare to Q3?" would be sent to the retrieval engine as-is, and "that" would match poorly against any document. With QueryGenerationMode.Contextual, RagChat rewrites it into something like "How does Q4 2024 revenue compare to Q3 2024 revenue?" before retrieval.


Step 5: Configure Query Contextualization

Fine-tune how the contextualizer uses conversation history:

ragChat.QueryGenerationMode = QueryGenerationMode.Contextual;
ragChat.ContextualizationOptions.MaxHistoryTurns = 5;       // Look back 5 turns (default: 10)
ragChat.ContextualizationOptions.MaxCompletionTokens = 128; // Token budget for rewrite (default: 256)
Property Default Effect
MaxHistoryTurns 10 Number of previous turns used for context. Lower values reduce latency.
MaxCompletionTokens 256 Token budget for the rewritten query. Keep low for simple rewrites.

Step 6: Switch to Advanced Query Modes

RagChat supports four query generation modes. You can switch modes at any point during a conversation:

Multi-Query Mode

Generates multiple query variants and merges results with RRF. Improves recall for ambiguous or broad queries.

ragChat.QueryGenerationMode = QueryGenerationMode.MultiQuery;
ragChat.MultiQueryOptions.QueryVariantCount = 4;          // Generate 4 variants (default: 3)
ragChat.MultiQueryOptions.MaxCompletionTokens = 256;      // Token budget per generation

HyDE Mode

Generates a hypothetical answer first, embeds it, and uses that embedding for retrieval. Bridges the vocabulary gap between short questions and long document passages.

ragChat.QueryGenerationMode = QueryGenerationMode.HypotheticalAnswer;
ragChat.HydeOptions.MaxCompletionTokens = 512;  // Token budget for hypothesis

See Improve Recall with Multi-Query and HyDE Retrieval for a detailed comparison of these modes.


RagChat inherits the retrieval strategy from its RagEngine. To use hybrid search with conversational RAG:

rag.RetrievalStrategy = new HybridRetrievalStrategy();

using var ragChat = new RagChat(rag, chatModel)
{
    QueryGenerationMode = QueryGenerationMode.Contextual,
    MaxRetrievedPartitions = 5,
    MinRelevanceScore = 0.2f
};

This gives you multi-turn conversation with contextual rewriting and hybrid (vector + BM25) retrieval in a single setup.


Step 8: Manage Conversation State

Clear History

Reset the conversation to start fresh:

ragChat.ClearHistory();

Access Chat History

Inspect the conversation state programmatically:

foreach (var message in ragChat.ChatHistory)
{
    Console.WriteLine($"[{message.AuthorRole}]: {message.Content}");
}

Custom Prompt Templates

Override how retrieved context is injected into the prompt:

ragChat.PromptTemplate = @"Use the following reference material to answer the user's question.
If the material does not contain the answer, state that clearly.

## Reference Material:
@context

## User Question:
@question";

The placeholders @context and @question are replaced automatically.


Common Issues

Problem Cause Fix
Follow-ups retrieve irrelevant passages QueryGenerationMode set to Original Switch to Contextual for conversation-aware retrieval
Rewritten query too verbose MaxCompletionTokens too high Reduce ContextualizationOptions.MaxCompletionTokens to 128
History context too stale Too many turns in memory Lower MaxHistoryTurns to 3 or 5
Slow responses Multiple LLM calls per query in MultiQuery/HyDE modes Use Contextual mode for lower latency, or reduce QueryVariantCount

Next Steps

Share