Class MultiTurnConversation
- Namespace
- LMKit.TextGeneration
- Assembly
- LM-Kit.NET.dll
High-level, production-ready conversation runtime for multi-turn chat.
MultiTurnConversation wraps a local language model and maintains the running conversation state (messages, system prompt, tool-calls, memory recall, etc.). It exposes a compact API to:
- submit user prompts (sync/async),
- regenerate or continue the last assistant answer,
- register model-callable tools and control per-turn tool policy,
- inject long-term AgentMemory and cap recall tokens,
- configure sampling (temperature/top-p/etc.) and repetition penalties,
- enforce structure with Grammar (mutually exclusive with tools),
- and persist/restore full chat sessions.
Threading model: generation operations are serialized internally so only one call runs at a time. Create one instance per independent conversation. Share the underlying LM across conversations if desired.
Typical usage:
// Load a model (ensure tensors/weights are loaded).
var lm = new LM("path/to/model.gguf", new LM.LoadingOptions { LoadTensors = true });
// Create a conversation with default settings.
using var chat = new MultiTurnConversation(lm);
// (Optional) set a system prompt before the first user message.
chat.SystemPrompt = "You are a concise technical assistant.";
// (Optional) register tools before first turn if your model supports tool-calls.
if (chat.Model.HasToolCalls)
{
chat.Tools.Register(new WebSearchTool());
chat.ToolPolicy.Choice = ToolChoice.Auto; // let the model decide
}
// Submit a user message
var result = chat.Submit("How do I stream tokens from this API?");
Console.WriteLine(result.Content);
// Regenerate a different answer for the same user turn
var alt = chat.RegenerateResponse();
Console.WriteLine(alt.Content);
// Save the entire session for later
chat.SaveSession("session.bin");
public sealed class MultiTurnConversation : IConversation, ITextGenerationSettings, IDisposable
- Inheritance
-
MultiTurnConversation
- Implements
- Inherited Members
Constructors
- MultiTurnConversation(LM, ChatHistory, int, ITextGenerationSettings)
Continue a conversation from an existing ChatHistory.
Use this when you want to resume an earlier interaction without loading a serialized session file. The provided history is cloned, and the model reference is normalized for this instance.
- MultiTurnConversation(LM, byte[])
Restore a previous conversation from serialized bytes.
- MultiTurnConversation(LM, int)
Create a new conversation with an LM and optional custom context size.
Pass
contextSize
= -1 (default) to let LM-Kit pick an optimal size based on hardware and model settings. Otherwise, the value is bounded by model limits and minimal configuration.
- MultiTurnConversation(LM, string)
Restore a previous conversation from a session file.
Properties
- ChatHistory
The full chat history of this conversation (system, user, assistant, and tool messages).
The runtime appends messages as you call Submit(string, CancellationToken) and related methods. You can snapshot/clone and reuse histories across conversations via the MultiTurnConversation(LM, ChatHistory, int, ITextGenerationSettings) constructor.
- ContextRemainingSpace
Remaining token budget currently available in the context window.
- ContextSize
Total token context size for this conversation (i.e., model's window for prompt + response).
- DisableReasoning
Enables or disables intermediate "reasoning"/"thinking" content for models that produce it.
When set to
true
, the conversation runtime requests that the model skip exposing intermediate reasoning. Support depends on model capabilities and configuration.
- Grammar
Grammar rules used to constrain model output (e.g., to produce valid JSON).
When set to non-null, repetition penalties are disabled by default to prevent conflicts. Incompatibility: Grammar and tool-calls cannot be used together. If any tool is registered in Tools while Grammar is non-null, Submit(string, CancellationToken) throws InvalidOperationException.
- InferencePolicies
Policies that govern inference behavior (e.g., input overflow handling).
- LogitBias
Logit bias adjustments applied during generation.
Use to nudge the model toward/away from specific tokens. You can also modify bias dynamically via BeforeTokenSampling.
- MaximumCompletionTokens
Maximum tokens allowed for the assistant completion per turn.
Default is
512
. Set to-1
to disable the limit entirely (subject to context capacity).
- MaximumRecallTokens
Maximum number of tokens recalled from Memory per turn.
Defaults to
ContextSize / 4
. The effective value is automatically capped to at mostContextSize / 2
.
- Memory
Long-term memory store used to recall relevant context across turns.
Assign an AgentMemory implementation to enable retrieval of relevant text partitions (e.g., user docs, FAQs). Retrieved snippets are injected as hidden context up to MaximumRecallTokens.
- Model
The underlying language model used by this conversation.
The same LM instance can be shared across conversations. This property is provided for inspection (e.g., capabilities such as HasToolCalls).
- RepetitionPenalty
Repetition penalty configuration used during generation.
Adjust to discourage the model from repeating recent n-grams/tokens. Disabled automatically when Grammar is set (to avoid conflicts), but you can re-enable manually if needed.
- SamplingMode
Token sampling strategy (e.g., temperature, top-p, top-k, dynamic sampling).
You can swap in a custom TokenSampling strategy at runtime.
- StopSequences
Sequences that cause generation to stop immediately when encountered.
Matching stop sequences are not included in the final output.
- SystemPrompt
System prompt applied to the model before the first user turn.
Set this property before the first user message. After a chat has started, the system prompt becomes immutable for the lifetime of this MultiTurnConversation. Create a new instance if you need a different system prompt midstream.
- ToolPolicy
Per-turn tool-calling policy used by the conversation runtime.
Controls whether tools are allowed, required, disabled, or whether a specific tool must be used on the current turn. This object guides the runtime; it does not enforce behavior on its own.
Defaults to Auto (the model may or may not call a tool). When using Specific, also set ForcedToolName
. KeepAllowParallelCalls
=false
unless your tools are idempotent and thread-safe.
Example:conversation.ToolPolicy.Choice = ToolChoice.Specific; conversation.ToolPolicy.ForcedToolName = "web_search"; conversation.ToolPolicy.MaxCallsPerTurn = 2;
- Tools
Registry of model-callable tools available to this conversation.
Register tools before the first user turn so they are advertised to the model. Tool invocation requires a model that supports tool calls (HasToolCalls).
Important: Grammar-constrained generation and tool-calls are mutually exclusive. If any tool is registered and Grammar is non-null, submission throws InvalidOperationException.
Example:if (conversation.Model.HasToolCalls) { conversation.Tools.Register(new WebSearchTool()); conversation.ToolPolicy.Choice = ToolChoice.Auto; } else { conversation.ToolPolicy.Choice = ToolChoice.None; // avoid tool-call attempts }
Methods
- ClearHistory()
Clear the entire conversation: removes all messages and resets internal state.
Use to start fresh while keeping the same model and configuration. The next submission will behave like a brand new session (including re-applying the SystemPrompt if set).
- ContinueLastAssistantResponse(CancellationToken)
Continue generating more tokens for the last assistant response (no new user input).
- ContinueLastAssistantResponseAsync(CancellationToken)
Asynchronously continue the last assistant message without any new user input.
Useful to let the model finish or expand an answer that was intentionally constrained by MaximumCompletionTokens. Not supported when Grammar is set.
- Dispose()
Dispose this conversation and release resources.
After disposal, any further method calls throw ObjectDisposedException.
- ~MultiTurnConversation()
Finalizer to ensure unmanaged resources are released if Dispose() was not called.
- RegenerateResponse(CancellationToken)
Regenerate a fresh response to the most recent user message without altering prior turns.
The previous assistant answer for that user message is replaced by a new one. Use this when you want an alternative phrasing or reasoning path.
- RegenerateResponseAsync(CancellationToken)
Asynchronously regenerate a response to the most recent user message.
The original answer is not deleted from history; it is replaced on the same turn with the new one.
- SaveSession()
Save the current session state (messages + runtime state) to bytes.
Use together with the MultiTurnConversation(LM, byte[]) constructor to restore later.
- SaveSession(string)
Save the current session state to a file on disk.
Use together with the MultiTurnConversation(LM, string) constructor to restore later.
- Submit(Prompt, CancellationToken)
Submit a Prompt (text and/or attachments) synchronously.
Use this overload when you need to control advanced prompt fields (e.g.,
NullOnDoubt
or auxiliary content).
- Submit(string, CancellationToken)
Submit a user prompt (string) and get a completion synchronously.
- SubmitAsync(Prompt, CancellationToken)
Submit a Prompt (text and/or attachments) asynchronously.
- SubmitAsync(string, CancellationToken)
Submit a user prompt (string) asynchronously.
Events
- AfterTextCompletion
Fired right after a completion finishes.
Use this event to inspect the full assistant output and optionally request to stop further post-processing by setting Stop.
- AfterTokenSampling
Fired immediately after a token is selected.
Handlers can override the chosen token, stop generation, or keep the last token out of the final output (useful for control tokens).
- AfterToolInvocation
Fired after a tool invocation finishes (or when it was cancelled/errored).
- BeforeTokenSampling
Fired just before the runtime samples the next token.
Use this to dynamically adjust sampling parameters or logit bias during generation. Setting Stop requests early stop.
- BeforeToolInvocation
Fired before a tool invocation. Handlers may cancel the call.
- MemoryRecall
Fired when one or more memory partitions are recalled for this turn.
Subscribers may inspect the recalled content and optionally cancel injection by setting Cancel to
true
. You can also prepend a custom prefix (e.g., a section header) via Prefix.