Class RagChat

Namespace: LMKit.Retrieval

Assembly: LM-Kit.NET.dll

Provides multi-turn conversational question-answering over a user-managed RagEngine, combining retrieval, query contextualization, and grounded response generation in a single turnkey interface.

public sealed class RagChat : IMultiTurnConversation, IConversation, IKVCache, IDisposable

Inheritance: object

RagChat

Implements: IMultiTurnConversation

IConversation

IKVCache

IDisposable

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Examples

// Set up a RAG engine with your data
var ragEngine = new RagEngine(embeddingModel);
ragEngine.ImportText("LM-Kit supports local LLM inference, RAG, and embeddings.");

// Create a conversational RAG chatbot
using var chat = new RagChat(ragEngine, chatModel);

// Optionally tune retrieval
chat.QueryGenerationMode = QueryGenerationMode.MultiQuery;
chat.MaxRetrievedPartitions = 10;
chat.Engine.MmrLambda = 0.7f;

// Stream responses
chat.AfterTextCompletion += (s, e) => Console.Write(e.Text);

// Ask questions with automatic context retrieval
var result = await chat.SubmitAsync("What capabilities does LM-Kit provide?");
Console.WriteLine(result.Response.Completion);

// Follow-up questions maintain conversation context
result = await chat.SubmitAsync("Tell me more about the RAG support.");

Remarks

RagChat is the general-purpose counterpart to PdfChat: while PdfChat handles the full document lifecycle (import, chunking, embedding, retrieval, and conversation), RagChat operates on a pre-populated RagEngine that the caller owns and manages. This makes it suitable for any RAG scenario: custom knowledge bases, multi-source corpora, or pipelines where data ingestion is handled separately.

On each call to SubmitAsync(string, CancellationToken), the class orchestrates:

Optional query contextualization using conversation history.
Retrieval dispatch based on QueryGenerationMode (original, contextual, multi-query, or HyDE).
Prompt construction with retrieved context via LMKit.Retrieval.RagPromptBuilder.
Response generation through an internal MultiTurnConversation.

The caller retains full control over the RagEngine (data sources, retrieval strategy, reranker, context window, MMR) via the Engine property. Advanced conversation features (tools, memory, skills) are accessible directly through the corresponding properties on this class.

Constructors

RagChat(RagEngine, LM, int): Initializes a new instance of the RagChat class.

Properties

ChatHistory: Gets the conversation history containing all exchanged messages.

ContextRemainingSpace: Remaining token budget currently available in the context window.

ContextSize: Gets the total token context size available for this conversation.

ContextualizationOptions: Gets the options that control how follow-up questions are reformulated when QueryGenerationMode is set to Contextual.

Engine: Gets the RagEngine used for retrieval.

HydeOptions: Gets the options that control how hypothetical answers are generated when QueryGenerationMode is set to HypotheticalAnswer.

ImageDetail: Gets or sets the level of detail used when processing images for vision models. Controls the maximum pixel budget allocated to images, which directly affects token consumption and visual fidelity. Default is High.

MaxRetrievedPartitions: Gets or sets the maximum number of partitions retrieved per query.

MaximumCompletionTokens: Gets or sets the maximum number of tokens to generate per response.

MaximumRecallTokens

Maximum number of tokens recalled from Memory per turn.

Defaults to ContextSize / 4. The effective value is automatically capped to at most ContextSize / 2.

Memory

Long-term memory store used to recall relevant context across turns.

Assign an AgentMemory implementation to enable retrieval of relevant text partitions. Retrieved snippets are injected as hidden context up to MaximumRecallTokens.

MinRelevanceScore: Gets or sets the minimum relevance score for retrieved partitions.

Model: Gets the language model used for generating responses.

MultiQueryOptions: Gets the options that control how query variants are generated when QueryGenerationMode is set to MultiQuery.

PromptTemplate: Gets or sets the prompt template used to inject retrieved context into the query.

QueryGenerationMode: Gets or sets the mode used to generate retrieval queries from user input.

ReasoningLevel: Gets or sets the reasoning level used during response generation.

RepetitionPenalty: Gets the repetition penalty configuration used to reduce repetitive outputs.

SamplingMode: Gets or sets the token sampling strategy for text generation.

Skills

Registry of Agent Skills available to this conversation.

Skills provide modular capabilities with specialized knowledge and workflows, following the Agent Skills specification.

SystemPrompt: Gets or sets the system prompt that defines the assistant's behavior.

ToolPolicy

Per-turn tool-calling policy used by the conversation runtime.

Controls whether tools are allowed, required, disabled, or whether a specific tool must be used on the current turn.

Tools

Registry of model-callable tools available to this conversation.

Register tools before the first user turn so they are advertised to the model. Tool invocation requires a model that supports tool calls.

Methods

ClearHistory(): Clears the conversation history, resetting the multi-turn state.

Dispose(): Releases all resources used by this instance.

Submit(string, CancellationToken): Submits a question and returns the generated response grounded in retrieved context.

SubmitAsync(string, CancellationToken): Asynchronously submits a question and returns the generated response grounded in retrieved context.

Events

AfterTextCompletion: Occurs during response generation as text is produced.

AfterToolInvocation: Fired after a tool invocation finishes (or when it was cancelled/errored).

BeforeToolInvocation: Fired before a tool invocation. Handlers may cancel the call.

MemoryRecall

Fired when one or more memory partitions are recalled for this turn.

Subscribers may inspect the recalled content and optionally cancel injection by setting Cancel to true.

RetrievalCompleted: Occurs when partition retrieval completes for a query, before response generation begins.

ToolApprovalRequired: Fired when a tool invocation requires user approval before execution.

Table of Contents