Table of Contents

Class MultiTurnConversation

Namespace
LMKit.TextGeneration
Assembly
LM-Kit.NET.dll

High-level, production-ready conversation runtime for multi-turn chat.

MultiTurnConversation wraps a local language model and maintains the running conversation state (messages, system prompt, tool-calls, memory recall, etc.). It exposes a compact API to:

  • submit user prompts (sync/async),
  • regenerate or continue the last assistant answer,
  • register model-callable tools and control per-turn tool policy,
  • inject long-term AgentMemory and cap recall tokens,
  • configure sampling (temperature/top-p/etc.) and repetition penalties,
  • enforce structure with Grammar (mutually exclusive with tools),
  • and persist/restore full chat sessions.

Threading model: generation operations are serialized internally so only one call runs at a time. Create one instance per independent conversation. Share the underlying LM across conversations if desired.

public sealed class MultiTurnConversation : IMultiTurnConversation, IConversation, ITextGenerationSettings, IDisposable
Inheritance
MultiTurnConversation
Implements
Inherited Members

Examples

Example: Basic multi-turn conversation

using LMKit.Model;
using LMKit.TextGeneration;
using System;

// Load the language model
LM model = LM.LoadFromModelID("llama-3.2-1b");

// Create a multi-turn conversation
using MultiTurnConversation chat = new MultiTurnConversation(model);

// Set a system prompt
chat.SystemPrompt = "You are a helpful assistant.";

// First turn
var result1 = chat.Submit("What is the capital of France?");
Console.WriteLine($"Assistant: {result1.Content}");

// Second turn - context is preserved
var result2 = chat.Submit("What is its population?");
Console.WriteLine($"Assistant: {result2.Content}");

// View conversation history
Console.WriteLine($"Total messages: {chat.History.Count}");

Example: Conversation with tools

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.Agents.Tools;
using System;

LM model = LM.LoadFromModelID("llama-3.2-1b");

using MultiTurnConversation chat = new MultiTurnConversation(model);
chat.SystemPrompt = "You are a technical assistant with access to tools.";

// Register tools if model supports them
if (model.HasToolCalls)
{
    chat.Tools.Register(new WebSearchTool());
    chat.ToolPolicy.Choice = ToolChoice.Auto;
}

var result = chat.Submit("Search for the latest news on AI.");
Console.WriteLine(result.Content);

Example: Save and restore session

using LMKit.Model;
using LMKit.TextGeneration;
using System;

LM model = LM.LoadFromModelID("llama-3.2-1b");

// Start a conversation and save it
using (var chat = new MultiTurnConversation(model))
{
    chat.Submit("Remember that my name is Alice.");
    chat.SaveSession("session.bin");
}

// Later, restore the conversation
using (var restoredChat = new MultiTurnConversation(model, "session.bin"))
{
    var result = restoredChat.Submit("What is my name?");
    Console.WriteLine(result.Content); // Should remember "Alice"
}

Constructors

MultiTurnConversation(LM, ChatHistory, int, ITextGenerationSettings)

Continue a conversation from an existing ChatHistory.

Use this when you want to resume an earlier interaction without loading a serialized session file. The provided history is cloned, and the model reference is normalized for this instance.

MultiTurnConversation(LM, byte[])

Restore a previous conversation from serialized bytes.

MultiTurnConversation(LM, int)

Create a new conversation with an LM and optional custom context size.

Pass contextSize = -1 (default) to let LM-Kit pick an optimal size based on hardware and model settings. Otherwise, the value is bounded by model limits and minimal configuration.

MultiTurnConversation(LM, string)

Restore a previous conversation from a session file.

Properties

ChatHistory

The full chat history of this conversation (system, user, assistant, and tool messages).

The runtime appends messages as you call Submit(string, CancellationToken) and related methods. You can snapshot/clone and reuse histories across conversations via the MultiTurnConversation(LM, ChatHistory, int, ITextGenerationSettings) constructor.

ContextRemainingSpace

Remaining token budget currently available in the context window.

ContextSize

Total token context size for this conversation (i.e., model's window for prompt + response).

Grammar

Grammar rules used to constrain model output (e.g., to produce valid JSON). When set to non-null, repetition penalties are disabled by default to prevent conflicts. Incompatibility: Grammar and tool-calls cannot be used together. If any tool is registered in Tools while Grammar is non-null, Submit(string, CancellationToken) throws InvalidOperationException.

InferencePolicies

Policies that govern inference behavior (e.g., input overflow handling).

LogitBias

Logit bias adjustments applied during generation.

Use to nudge the model toward/away from specific tokens. You can also modify bias dynamically via BeforeTokenSampling.

MaximumCompletionTokens

Maximum tokens allowed for the assistant completion per turn.

Default is 2048. Set to -1 to disable the limit entirely (subject to context capacity).

MaximumRecallTokens

Maximum number of tokens recalled from Memory per turn.

Defaults to ContextSize / 4. The effective value is automatically capped to at most ContextSize / 2.

Memory

Long-term memory store used to recall relevant context across turns.

Assign an AgentMemory implementation to enable retrieval of relevant text partitions (e.g., user docs, FAQs). Retrieved snippets are injected as hidden context up to MaximumRecallTokens.

Model

The underlying language model used by this conversation.

The same LM instance can be shared across conversations. This property is provided for inspection (e.g., capabilities such as HasToolCalls).

ReasoningLevel

Controls how (and whether) intermediate "reasoning"/"thinking" content is produced and/or exposed.

Use None to fully disable reasoning. Higher levels hint the model to allocate more budget to chain-of-thought style tokens when the model supports it. Support depends on model and template capabilities.

Typical semantics:
LevelIntended behavior
NoneNo reasoning tokens requested or exposed.
LowMinimal reasoning; terse scratch space when helpful.
MediumBalanced reasoning (default if enabled).
HighMaximize reasoning depth; may trade off speed.
RepetitionPenalty

Repetition penalty configuration used during generation.

Adjust to discourage the model from repeating recent n-grams/tokens. Disabled automatically when Grammar is set (to avoid conflicts), but you can re-enable manually if needed.

SamplingMode

Token sampling strategy (e.g., temperature, top-p, top-k, dynamic sampling).

You can swap in a custom TokenSampling strategy at runtime.

Skills

Registry of Agent Skills available to this conversation.

Skills provide modular capabilities with specialized knowledge and workflows, following the Agent Skills specification.

StopSequences

Sequences that cause generation to stop immediately when encountered.

Matching stop sequences are not included in the final output.

SystemPrompt

System prompt applied to the model before the first user turn.

Set this property before the first user message. After a chat has started, the system prompt becomes immutable for the lifetime of this MultiTurnConversation. Create a new instance if you need a different system prompt midstream.

ToolPolicy

Per-turn tool-calling policy used by the conversation runtime.

Controls whether tools are allowed, required, disabled, or whether a specific tool must be used on the current turn. This object guides the runtime; it does not enforce behavior on its own.

Defaults to Auto (the model may or may not call a tool). When using Specific, also set ForcedToolName. Keep AllowParallelCalls = false unless your tools are idempotent and thread-safe.

Example:
conversation.ToolPolicy.Choice = ToolChoice.Specific;
conversation.ToolPolicy.ForcedToolName = "web_search";
conversation.ToolPolicy.MaxCallsPerTurn = 2;
Tools

Registry of model-callable tools available to this conversation.

Register tools before the first user turn so they are advertised to the model. Tool invocation requires a model that supports tool calls (HasToolCalls).

Important: Grammar-constrained generation and tool-calls are mutually exclusive. If any tool is registered and Grammar is non-null, submission throws InvalidOperationException.

Example:
if (conversation.Model.HasToolCalls)
{
    conversation.Tools.Register(new WebSearchTool());
    conversation.ToolPolicy.Choice = ToolChoice.Auto;
}
else
{
    conversation.ToolPolicy.Choice = ToolChoice.None; // avoid tool-call attempts
}

Methods

ClearHistory()

Clear the entire conversation: removes all messages and resets internal state.

Use to start fresh while keeping the same model and configuration. The next submission will behave like a brand new session (including re-applying the SystemPrompt if set).

ContinueLastAssistantResponse(CancellationToken)

Continue generating more tokens for the last assistant response (no new user input).

ContinueLastAssistantResponseAsync(CancellationToken)

Asynchronously continue the last assistant message without any new user input.

Useful to let the model finish or expand an answer that was intentionally constrained by MaximumCompletionTokens. Not supported when Grammar is set.

Dispose()

Dispose this conversation and release resources.

After disposal, any further method calls throw ObjectDisposedException.

~MultiTurnConversation()

Finalizer to ensure unmanaged resources are released if Dispose() was not called.

RegenerateResponse(CancellationToken)

Regenerate a fresh response to the most recent user message without altering prior turns.

The previous assistant answer for that user message is replaced by a new one. Use this when you want an alternative phrasing or reasoning path.

RegenerateResponseAsync(CancellationToken)

Asynchronously regenerate a response to the most recent user message.

The original answer is not deleted from history; it is replaced on the same turn with the new one.

SaveSession()

Save the current session state (messages + runtime state) to bytes.

Use together with the MultiTurnConversation(LM, byte[]) constructor to restore later.

SaveSession(string)

Save the current session state to a file on disk.

Use together with the MultiTurnConversation(LM, string) constructor to restore later.

Submit(Message, CancellationToken)

Submit a ChatHistory.Message (text and/or attachments) synchronously.

Use this overload when you need to control advanced prompt fields (e.g., NullOnDoubt or auxiliary content).

Submit(string, CancellationToken)

Submit a user prompt (string) and get a completion synchronously.

SubmitAsync(Message, CancellationToken)

Submit a ChatHistory.Message (text and/or attachments) asynchronously.

SubmitAsync(string, CancellationToken)

Submit a user prompt (string) asynchronously.

Events

AfterTextCompletion

Fired right after a completion finishes.

Use this event to inspect the full assistant output and optionally request to stop further post-processing by setting Stop.

AfterTokenSampling

Fired immediately after a token is selected.

Handlers can override the chosen token, stop generation, or keep the last token out of the final output (useful for control tokens).

AfterToolInvocation

Fired after a tool invocation finishes (or when it was cancelled/errored).

BeforeTokenSampling

Fired just before the runtime samples the next token.

Use this to dynamically adjust sampling parameters or logit bias during generation. Setting Stop requests early stop.

BeforeToolInvocation

Fired before a tool invocation. Handlers may cancel the call.

MemoryRecall

Fired when one or more memory partitions are recalled for this turn.

Subscribers may inspect the recalled content and optionally cancel injection by setting Cancel to true. You can also prepend a custom prefix (e.g., a section header) via Prefix.