Table of Contents

Build a Conversational Assistant with Memory

A useful assistant remembers what you told it: your preferences, project details, and past decisions. LM-Kit.NET's MultiTurnConversation class maintains chat history across turns, and AgentMemory persists knowledge across sessions. This tutorial builds a conversational assistant that streams responses, saves sessions, and recalls information from previous conversations.


Why Local Conversational Assistants Matter

Two enterprise problems that on-device assistants solve:

  1. Conversation history stays on-premises. Multi-turn assistants accumulate detailed context about users, projects, and business processes. With a local model, that context never leaves your infrastructure, making it safe for internal tools that handle sensitive projects.
  2. Predictable latency and availability. A local assistant responds in consistent time regardless of API traffic, outages, or rate limits. Critical for real-time tools where users wait for each response.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 4+ GB
Disk ~3 GB free for model download

Step 1: Create the Project

dotnet new console -n AssistantQuickstart
cd AssistantQuickstart
dotnet add package LM-Kit.NET

Step 2: Basic Multi-Turn Chat

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create conversation with streaming
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful coding assistant. " +
        "Give concise, practical answers with code examples when relevant.",
    MaximumCompletionTokens = 1024
};

chat.AfterTextCompletion += (sender, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// ──────────────────────────────────────
// 3. Chat loop
// ──────────────────────────────────────
Console.WriteLine("Chat with the assistant (type 'quit' to exit):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("You: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write("Assistant: ");
    Console.ResetColor();

    TextGenerationResult result = chat.Submit(input);
    Console.WriteLine($"\n  [{result.GeneratedTokenCount} tokens, {result.TokenGenerationRate:F1} tok/s]\n");
}

Step 3: Save and Restore Sessions

Persist conversation state to disk so users can resume later:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create conversation with streaming
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful coding assistant. " +
        "Give concise, practical answers with code examples when relevant.",
    MaximumCompletionTokens = 1024
};

string sessionFile = "chat_session.bin";

// ──────────────────────────────────────
// Restore previous session if available
// ──────────────────────────────────────
MultiTurnConversation chat;

if (File.Exists(sessionFile))
{
    Console.WriteLine("Restoring previous session...\n");
    chat = new MultiTurnConversation(model, sessionFile);
}
else
{
    chat = new MultiTurnConversation(model)
    {
        SystemPrompt = "You are a helpful, conversational assistant."
    };
}

chat.AfterTextCompletion += (sender, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// ──────────────────────────────────────
// Chat loop with session save
// ──────────────────────────────────────
while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("You: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write("Assistant: ");
    Console.ResetColor();

    chat.Submit(input);
    Console.WriteLine("\n");
}

// Save session on exit
chat.SaveSession(sessionFile);
Console.WriteLine($"Session saved to {sessionFile}");

chat.Dispose();

Step 4: Chat History Management

Inspect and manipulate the conversation history:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant."
};

chat.AfterTextCompletion += (sender, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// Have a conversation
Console.Write("Assistant: ");
chat.Submit("My name is Alice and I'm working on Project Atlas.");
Console.WriteLine("\n");

Console.Write("Assistant: ");
chat.Submit("We're building a recommendation engine using collaborative filtering.");
Console.WriteLine("\n");

// Inspect chat history
Console.WriteLine($"History: {chat.ChatHistory.Messages.Count} messages");
Console.WriteLine($"Context: {chat.ContextRemainingSpace} tokens remaining\n");

foreach (var msg in chat.ChatHistory.Messages)
{
    string role = msg.AuthorRole.ToString();
    string preview = msg.Content.Length > 60 ? msg.Content.Substring(0, 60) + "..." : msg.Content;
    Console.WriteLine($"  [{role}] {preview}");
}

// The assistant now knows about Alice and Project Atlas
Console.Write("\nAssistant: ");
chat.Submit("What project am I working on and what approach are we using?");
Console.WriteLine("\n");

Step 5: Long-Term Memory with AgentMemory

AgentMemory stores knowledge across sessions using RAG. The assistant recalls facts from previous conversations.

Tip: You can also enable automatic memory extraction so the LLM identifies and stores facts from each conversation turn without any manual code. See Remember Facts Across Chat Sessions with Automatic Memory Extraction.

using LMKit.Agents;
using LMKit.TextGeneration.Chat;

// Load an embedding model for memory storage
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine();

// Create persistent memory
string memoryFile = "assistant_memory.dat";
var memory = new AgentMemory(embeddingModel);

var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful, conversational assistant.",
    Memory = memory,
    MaximumRecallTokens = 512
};

chat.AfterTextCompletion += (sender, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// Chat loop
Console.WriteLine("Chat with memory-enabled assistant:\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("You: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write("Assistant: ");
    Console.ResetColor();

    chat.Submit(input);
    Console.WriteLine("\n");
}

chat.Dispose();

Step 6: Specialized Assistant Personas

Create different assistants for different tasks by changing the system prompt:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// Code review assistant
var codeReviewer = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a senior software engineer performing code reviews. " +
        "Focus on bugs, security issues, performance problems, and maintainability. " +
        "Be direct and specific. Reference line numbers when possible.",
    MaximumCompletionTokens = 2048
};

// Technical writer
var techWriter = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a technical writer. Rewrite explanations to be clear, " +
        "well-structured, and accessible to developers of all levels. " +
        "Use examples and avoid jargon.",
    MaximumCompletionTokens = 1024
};

// SQL assistant
var sqlHelper = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a database expert. Help write and optimize SQL queries. " +
        "Always explain the query logic. Warn about potential performance issues " +
        "with large tables.",
    MaximumCompletionTokens = 1024
};

Common Issues

Problem Cause Fix
Assistant forgets earlier context Context window full Increase ContextSize; or use AgentMemory for long-term recall
Responses cut off mid-sentence MaximumCompletionTokens too low Increase to 1024 or 2048
Slow first response Model not cached in VRAM First inference is slower; subsequent ones are faster
Session file too large Long conversation history Call chat.ClearHistory() periodically; rely on AgentMemory for recall
System prompt ignored Prompt too long or vague Keep system prompts under 200 words; be specific about behavior

Next Steps

Share