Class MultiTurnConversation

Namespace: LMKit.TextGeneration

Assembly: LM-Kit.NET.dll

High-level, production-ready conversation runtime for multi-turn chat.

MultiTurnConversation wraps a local language model and maintains the running conversation state (messages, system prompt, tool-calls, memory recall, etc.). It exposes a compact API to:

submit user prompts (sync/async),
regenerate or continue the last assistant answer,
register model-callable tools and control per-turn tool policy,
inject long-term AgentMemory and cap recall tokens,
configure sampling (temperature/top-p/etc.) and repetition penalties,
enforce structure with Grammar (mutually exclusive with tools),
and persist/restore full chat sessions.

Threading model: generation operations are serialized internally so only one call runs at a time. Create one instance per independent conversation. Share the underlying LM across conversations if desired.

Typical usage:

// Load a model (ensure tensors/weights are loaded).
var lm = new LM("path/to/model.gguf", new LM.LoadingOptions { LoadTensors = true });

// Create a conversation with default settings.
using var chat = new MultiTurnConversation(lm);

// (Optional) set a system prompt before the first user message.
chat.SystemPrompt = "You are a concise technical assistant.";

// (Optional) register tools before first turn if your model supports tool-calls.
if (chat.Model.HasToolCalls)
{
    chat.Tools.Register(new WebSearchTool());
    chat.ToolPolicy.Choice = ToolChoice.Auto; // let the model decide
}

// Submit a user message
var result = chat.Submit("How do I stream tokens from this API?");
Console.WriteLine(result.Content);

// Regenerate a different answer for the same user turn
var alt = chat.RegenerateResponse();
Console.WriteLine(alt.Content);

// Save the entire session for later
chat.SaveSession("session.bin");

public sealed class MultiTurnConversation : IConversation, ITextGenerationSettings, IDisposable

Inheritance: object

MultiTurnConversation

Implements: IConversation

ITextGenerationSettings

IDisposable

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Constructors

MultiTurnConversation(LM, ChatHistory, int, ITextGenerationSettings)

Continue a conversation from an existing ChatHistory.

Use this when you want to resume an earlier interaction without loading a serialized session file. The provided history is cloned, and the model reference is normalized for this instance.

MultiTurnConversation(LM, byte[]): Restore a previous conversation from serialized bytes.

MultiTurnConversation(LM, int)

Create a new conversation with an LM and optional custom context size.

Pass contextSize = -1 (default) to let LM-Kit pick an optimal size based on hardware and model settings. Otherwise, the value is bounded by model limits and minimal configuration.

MultiTurnConversation(LM, string): Restore a previous conversation from a session file.

Properties

ChatHistory

The full chat history of this conversation (system, user, assistant, and tool messages).

The runtime appends messages as you call Submit(string, CancellationToken) and related methods. You can snapshot/clone and reuse histories across conversations via the MultiTurnConversation(LM, ChatHistory, int, ITextGenerationSettings) constructor.

ContextRemainingSpace: Remaining token budget currently available in the context window.

ContextSize: Total token context size for this conversation (i.e., model's window for prompt + response).

Grammar: Grammar rules used to constrain model output (e.g., to produce valid JSON). When set to non-null, repetition penalties are disabled by default to prevent conflicts. Incompatibility: Grammar and tool-calls cannot be used together. If any tool is registered in Tools while Grammar is non-null, Submit(string, CancellationToken) throws InvalidOperationException.

InferencePolicies: Policies that govern inference behavior (e.g., input overflow handling).

LogitBias

Logit bias adjustments applied during generation.

Use to nudge the model toward/away from specific tokens. You can also modify bias dynamically via BeforeTokenSampling.

MaximumCompletionTokens

Maximum tokens allowed for the assistant completion per turn.

Default is 512. Set to -1 to disable the limit entirely (subject to context capacity).

MaximumRecallTokens

Maximum number of tokens recalled from Memory per turn.

Defaults to ContextSize / 4. The effective value is automatically capped to at most ContextSize / 2.

Memory

Long-term memory store used to recall relevant context across turns.

Assign an AgentMemory implementation to enable retrieval of relevant text partitions (e.g., user docs, FAQs). Retrieved snippets are injected as hidden context up to MaximumRecallTokens.

Model

The underlying language model used by this conversation.

The same LM instance can be shared across conversations. This property is provided for inspection (e.g., capabilities such as HasToolCalls).

ReasoningLevel

Controls how (and whether) intermediate "reasoning"/"thinking" content is produced and/or exposed.

Use None to fully disable reasoning. Higher levels hint the model to allocate more budget to chain-of-thought style tokens when the model supports it. Support depends on model and template capabilities.

Typical semantics:

Level	Intended behavior
None	No reasoning tokens requested or exposed.
Low	Minimal reasoning; terse scratch space when helpful.
Medium	Balanced reasoning (default if enabled).
High	Maximize reasoning depth; may trade off speed.

RepetitionPenalty

Repetition penalty configuration used during generation.

Adjust to discourage the model from repeating recent n-grams/tokens. Disabled automatically when Grammar is set (to avoid conflicts), but you can re-enable manually if needed.

SamplingMode

Token sampling strategy (e.g., temperature, top-p, top-k, dynamic sampling).

You can swap in a custom TokenSampling strategy at runtime.

StopSequences

Sequences that cause generation to stop immediately when encountered.

Matching stop sequences are not included in the final output.

SystemPrompt

System prompt applied to the model before the first user turn.

Set this property before the first user message. After a chat has started, the system prompt becomes immutable for the lifetime of this MultiTurnConversation. Create a new instance if you need a different system prompt midstream.

ToolPolicy

Per-turn tool-calling policy used by the conversation runtime.

Controls whether tools are allowed, required, disabled, or whether a specific tool must be used on the current turn. This object guides the runtime; it does not enforce behavior on its own.

Defaults to Auto (the model may or may not call a tool). When using Specific, also set ForcedToolName. Keep AllowParallelCalls = false unless your tools are idempotent and thread-safe.

Example:

conversation.ToolPolicy.Choice = ToolChoice.Specific;
conversation.ToolPolicy.ForcedToolName = "web_search";
conversation.ToolPolicy.MaxCallsPerTurn = 2;

Tools

Registry of model-callable tools available to this conversation.

Register tools before the first user turn so they are advertised to the model. Tool invocation requires a model that supports tool calls (HasToolCalls).

Important: Grammar-constrained generation and tool-calls are mutually exclusive. If any tool is registered and Grammar is non-null, submission throws InvalidOperationException.

Example:

if (conversation.Model.HasToolCalls)
{
    conversation.Tools.Register(new WebSearchTool());
    conversation.ToolPolicy.Choice = ToolChoice.Auto;
}
else
{
    conversation.ToolPolicy.Choice = ToolChoice.None; // avoid tool-call attempts
}

Methods

ClearHistory()

Clear the entire conversation: removes all messages and resets internal state.

Use to start fresh while keeping the same model and configuration. The next submission will behave like a brand new session (including re-applying the SystemPrompt if set).

ContinueLastAssistantResponse(CancellationToken): Continue generating more tokens for the last assistant response (no new user input).

ContinueLastAssistantResponseAsync(CancellationToken)

Asynchronously continue the last assistant message without any new user input.

Useful to let the model finish or expand an answer that was intentionally constrained by MaximumCompletionTokens. Not supported when Grammar is set.

Dispose()

Dispose this conversation and release resources.

After disposal, any further method calls throw ObjectDisposedException.

~MultiTurnConversation(): Finalizer to ensure unmanaged resources are released if Dispose() was not called.

RegenerateResponse(CancellationToken)

Regenerate a fresh response to the most recent user message without altering prior turns.

The previous assistant answer for that user message is replaced by a new one. Use this when you want an alternative phrasing or reasoning path.

RegenerateResponseAsync(CancellationToken)

Asynchronously regenerate a response to the most recent user message.

The original answer is not deleted from history; it is replaced on the same turn with the new one.

SaveSession()

Save the current session state (messages + runtime state) to bytes.

Use together with the MultiTurnConversation(LM, byte[]) constructor to restore later.

SaveSession(string)

Save the current session state to a file on disk.

Use together with the MultiTurnConversation(LM, string) constructor to restore later.

Submit(Prompt, CancellationToken)

Submit a Prompt (text and/or attachments) synchronously.

Use this overload when you need to control advanced prompt fields (e.g., NullOnDoubt or auxiliary content).

Submit(string, CancellationToken): Submit a user prompt (string) and get a completion synchronously.

SubmitAsync(Prompt, CancellationToken): Submit a Prompt (text and/or attachments) asynchronously.

SubmitAsync(string, CancellationToken): Submit a user prompt (string) asynchronously.

Events

AfterTextCompletion

Fired right after a completion finishes.

Use this event to inspect the full assistant output and optionally request to stop further post-processing by setting Stop.

AfterTokenSampling

Fired immediately after a token is selected.

Handlers can override the chosen token, stop generation, or keep the last token out of the final output (useful for control tokens).

AfterToolInvocation: Fired after a tool invocation finishes (or when it was cancelled/errored).

BeforeTokenSampling

Fired just before the runtime samples the next token.

Use this to dynamically adjust sampling parameters or logit bias during generation. Setting Stop requests early stop.

BeforeToolInvocation: Fired before a tool invocation. Handlers may cancel the call.

MemoryRecall

Fired when one or more memory partitions are recalled for this turn.

Subscribers may inspect the recalled content and optionally cancel injection by setting Cancel to true. You can also prepend a custom prefix (e.g., a section header) via Prefix.

Table of Contents

Class MultiTurnConversation

Constructors

Properties

Methods

Events