💬 What is Chat Completion?
📄 TL;DR
Chat completion is a generation mode where the model produces the next assistant message from a structured conversation history instead of a single standalone prompt. You send a sequence of messages (system, user, assistant, tool output), and the model answers while taking the full transcript into account.
In LM-Kit.NET, chat completion is primarily implemented through MultiTurnConversation, which maintains the evolving ChatHistory and tracks practical constraints like context size, remaining token budget, and per-turn limits. (LM-Kit Docs)
🧠 What Is Chat Completion?
Think of chat completion as “predict the next reply in a dialogue”.
A conversation is represented as:
- A history of messages (
ChatHistory) (LM-Kit Docs) - Each message has a role (
AuthorRole) such as System, User, Assistant, and Tool (LM-Kit Docs) - The model generates the next Assistant message based on the whole transcript
This is what makes assistants feel coherent. A user can say “do the same but shorter”, and the model knows what “the same” refers to because it sees the previous turns.
⚙️ How Chat Completion Works in LM-Kit.NET
1) The runtime stores the conversation as ChatHistory
ChatHistory contains the message list and provides helpers to build the model prompt, including:
MessagesandMessageCount(LM-Kit Docs)- Role-aware formatting via prefixes and suffixes (
SystemPrefix,UserPrefix,AssistantPrefix, and their corresponding suffixes) (LM-Kit Docs) ToText()to render the full formatted prompt, andToTokens()to render the prompt as model tokens (LM-Kit Docs)
That last part is important because it connects the “chat transcript” you see as a developer to what the model actually receives.
2) MultiTurnConversation appends turns as you chat
MultiTurnConversation exposes a ChatHistory property and automatically appends messages as you call Submit(...). (LM-Kit Docs)
It also supports real-world workflows:
- Start fresh with a chosen context size, or let LM-Kit pick an optimal one (
contextSize = -1) based on hardware and model settings (LM-Kit Docs) - Resume from an existing
ChatHistory(it clones the history) (LM-Kit Docs) - Restore from a serialized session (bytes or file) (LM-Kit Docs)
3) Context is a budget, not a vibe
Chat completion is always constrained by the model’s context window. LM-Kit.NET surfaces that clearly:
ContextSize: total window for prompt plus response (LM-Kit Docs)ContextRemainingSpace: what is still available right now (LM-Kit Docs)MaximumCompletionTokens: per-turn cap (default 2048,-1disables the limit subject to context capacity) (LM-Kit Docs)
That makes chat completion predictable: you can measure when you are close to overflow and decide how to trim or summarize history.
🧩 Roles: the hidden superpower of chat
Roles are how you get reliable behavior without turning every prompt into a fragile wall of text.
LM-Kit.NET’s AuthorRole explicitly distinguishes:
- System: sets behavior and high-level context (LM-Kit Docs)
- User: what the user said (LM-Kit Docs)
- Assistant: what the model generated (LM-Kit Docs)
- Tool: structured tool results returned to the model after a tool call (LM-Kit Docs)
- Developer: application-level instructions and policies (LM-Kit Docs)
If you want “super interesting” behavior, this is where it starts: instead of stuffing everything into one prompt, you place information in the right role so the model treats it correctly.
🔧 Tools + Chat Completion: where assistants become useful
Chat completion becomes an “agent loop” once tools enter the picture.
LM-Kit.NET’s tools demo describes a pattern where the model can decide to call one or more tools, pass JSON arguments matching each tool schema, then use the tool’s JSON results to craft a grounded reply. Tools implement ITool, and behavior can be shaped via ToolChoice. (LM-Kit Docs)
Two key design details matter a lot in production:
✅ Tool results belong in the transcript
Tool output is stored as AuthorRole.Tool, meaning it becomes part of the conversation state that future turns can reference. (LM-Kit Docs)
⚠️ Grammar constraints do not mix with tools
MultiTurnConversation documents a hard incompatibility: if Grammar is set (used to constrain output like JSON) and any tool is registered, Submit(...) throws an InvalidOperationException. (LM-Kit Docs)
Practical takeaway: pick one per flow.
- Need strict JSON? Use grammar constraints.
- Need tool calling? Let tools enforce structure and validation.
🧠 Memory in chat completion
MultiTurnConversation includes a Memory store for recalling relevant context across turns, plus controls like MaximumRecallTokens to cap how much recalled content is injected per turn. (LM-Kit Docs)
This is the difference between:
- short-term memory: the live
ChatHistory - long-term memory: recalled context injected when useful
It is how chats stay helpful even when the user returns after many turns or switches topics.
🔁 Chat completion vs single-turn completion
If your app is “ask once, answer once”, single-turn is simpler and faster.
LM-Kit.NET makes the distinction explicit:
SingleTurnConversationis designed for single-turn Q&A and does not preserve context between questions and answers (LM-Kit Docs)MultiTurnConversationpreserves the full dialogue history, tool results, and memory recall across turns (LM-Kit Docs)
A nice mental model:
- Single-turn is like a search box.
- Multi-turn chat is like a relationship, it remembers.
🧪 A tiny C# mental model (history-first)
// Pseudocode style to illustrate the flow
// 1) Create a multi-turn conversation
var chat = new MultiTurnConversation(lm, contextSize: -1);
// 2) ChatCompletion is "append message, generate next assistant turn"
var reply1 = chat.Submit("Hello! Summarize this project in 3 bullets.");
// 3) The history now contains System/User/Assistant turns
var promptText = chat.ChatHistory.ToText();
// 4) Next turn builds on the full transcript
var reply2 = chat.Submit("Make it shorter and more technical.");
The “magic” is not the second prompt. The magic is that it is evaluated inside a growing transcript. (LM-Kit Docs)
🌟 How to make chat completion feel amazing
Three high-leverage tips that map directly to LM-Kit.NET concepts:
Treat context like money Watch
ContextRemainingSpaceand avoid “history bloat” by summarizing older turns when needed. (LM-Kit Docs)Use roles deliberately Put policies in System or Developer roles, not inside user text. Roles exist to prevent your prompt from becoming spaghetti. (LM-Kit Docs)
Ground facts with tools If the answer must be correct, call tools and store results as tool messages, then let the assistant explain. (LM-Kit Docs)
Bonus: if you want the assistant to “feel” different, sampling changes help. LM-Kit has a dedicated multi-turn chat sample for custom sampling strategies (top-k, top-p, temperature, logit biases). (LM-Kit Docs)
📝 Summary
Chat completion is next-message generation over a role-aware conversation history.
In LM-Kit.NET, the core building blocks are:
MultiTurnConversationfor multi-turn state, context budgeting, memory recall, and tool-aware chat (LM-Kit Docs)ChatHistoryfor storing, formatting, tokenizing, and serializing the conversation (LM-Kit Docs)AuthorRolefor separating system instructions, user input, assistant output, and tool results (LM-Kit Docs)