Build a Conversational Assistant with Memory

A useful assistant remembers what you told it: your preferences, project details, and past decisions. LM-Kit.NET's MultiTurnConversation class maintains chat history across turns, and AgentMemory persists knowledge across sessions. This tutorial builds a conversational assistant that streams responses, saves sessions, and recalls information from previous conversations.

Why Local Conversational Assistants Matter

Two enterprise problems that on-device assistants solve:

Conversation history stays on-premises. Multi-turn assistants accumulate detailed context about users, projects, and business processes. With a local model, that context never leaves your infrastructure, making it safe for internal tools that handle sensitive projects.
Predictable latency and availability. A local assistant responds in consistent time regardless of API traffic, outages, or rate limits. Critical for real-time tools where users wait for each response.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	4+ GB
Disk	~3 GB free for model download

Step 1: Create the Project

dotnet new console -n AssistantQuickstart
cd AssistantQuickstart
dotnet add package LM-Kit.NET

Step 2: Basic Multi-Turn Chat

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create conversation with streaming
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful coding assistant. " +
        "Give concise, practical answers with code examples when relevant.",
    MaximumCompletionTokens = 1024
};

chat.AfterTextCompletion += (sender, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// ──────────────────────────────────────
// 3. Chat loop
// ──────────────────────────────────────
Console.WriteLine("Chat with the assistant (type 'quit' to exit):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("You: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write("Assistant: ");
    Console.ResetColor();

    TextGenerationResult result = chat.Submit(input);
    Console.WriteLine($"\n  [{result.GeneratedTokenCount} tokens, {result.TokenGenerationRate:F1} tok/s]\n");
}

Step 3: Save and Restore Sessions

Persist conversation state to disk so users can resume later:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create conversation with streaming
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful coding assistant. " +
        "Give concise, practical answers with code examples when relevant.",
    MaximumCompletionTokens = 1024
};

string sessionFile = "chat_session.bin";

// ──────────────────────────────────────
// Restore previous session if available
// ──────────────────────────────────────
MultiTurnConversation chat;

if (File.Exists(sessionFile))
{
    Console.WriteLine("Restoring previous session...\n");
    chat = new MultiTurnConversation(model, sessionFile);
}
else
{
    chat = new MultiTurnConversation(model)
    {
        SystemPrompt = "You are a helpful, conversational assistant."
    };
}

chat.AfterTextCompletion += (sender, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// ──────────────────────────────────────
// Chat loop with session save
// ──────────────────────────────────────
while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("You: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write("Assistant: ");
    Console.ResetColor();

    chat.Submit(input);
    Console.WriteLine("\n");
}

// Save session on exit
chat.SaveSession(sessionFile);
Console.WriteLine($"Session saved to {sessionFile}");

chat.Dispose();

Step 4: Chat History Management

Inspect and manipulate the conversation history:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant."
};

chat.AfterTextCompletion += (sender, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// Have a conversation
Console.Write("Assistant: ");
chat.Submit("My name is Alice and I'm working on Project Atlas.");
Console.WriteLine("\n");

Console.Write("Assistant: ");
chat.Submit("We're building a recommendation engine using collaborative filtering.");
Console.WriteLine("\n");

// Inspect chat history
Console.WriteLine($"History: {chat.ChatHistory.Messages.Count} messages");
Console.WriteLine($"Context: {chat.ContextRemainingSpace} tokens remaining\n");

foreach (var msg in chat.ChatHistory.Messages)
{
    string role = msg.AuthorRole.ToString();
    string preview = msg.Content.Length > 60 ? msg.Content.Substring(0, 60) + "..." : msg.Content;
    Console.WriteLine($"  [{role}] {preview}");
}

// The assistant now knows about Alice and Project Atlas
Console.Write("\nAssistant: ");
chat.Submit("What project am I working on and what approach are we using?");
Console.WriteLine("\n");

Step 5: Long-Term Memory with AgentMemory

AgentMemory stores knowledge across sessions using RAG. The assistant recalls facts from previous conversations.

Tip: You can also enable automatic memory extraction so the LLM identifies and stores facts from each conversation turn without any manual code. See Remember Facts Across Chat Sessions with Automatic Memory Extraction.

using LMKit.Agents;
using LMKit.TextGeneration.Chat;

// Load an embedding model for memory storage
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine();

// Create persistent memory
string memoryFile = "assistant_memory.dat";
var memory = new AgentMemory(embeddingModel);

var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful, conversational assistant.",
    Memory = memory,
    MaximumRecallTokens = 512
};

chat.AfterTextCompletion += (sender, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// Chat loop
Console.WriteLine("Chat with memory-enabled assistant:\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("You: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.Write("Assistant: ");
    Console.ResetColor();

    chat.Submit(input);
    Console.WriteLine("\n");
}

chat.Dispose();

Step 6: Specialized Assistant Personas

Create different assistants for different tasks by changing the system prompt:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// Code review assistant
var codeReviewer = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a senior software engineer performing code reviews. " +
        "Focus on bugs, security issues, performance problems, and maintainability. " +
        "Be direct and specific. Reference line numbers when possible.",
    MaximumCompletionTokens = 2048
};

// Technical writer
var techWriter = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a technical writer. Rewrite explanations to be clear, " +
        "well-structured, and accessible to developers of all levels. " +
        "Use examples and avoid jargon.",
    MaximumCompletionTokens = 1024
};

// SQL assistant
var sqlHelper = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a database expert. Help write and optimize SQL queries. " +
        "Always explain the query logic. Warn about potential performance issues " +
        "with large tables.",
    MaximumCompletionTokens = 1024
};

Common Issues

Problem	Cause	Fix
Assistant forgets earlier context	Context window full	Increase `ContextSize`; or use `AgentMemory` for long-term recall
Responses cut off mid-sentence	`MaximumCompletionTokens` too low	Increase to 1024 or 2048
Slow first response	Model not cached in VRAM	First inference is slower; subsequent ones are faster
Session file too large	Long conversation history	Call `chat.ClearHistory()` periodically; rely on `AgentMemory` for recall
System prompt ignored	Prompt too long or vague	Keep system prompts under 200 words; be specific about behavior

Next Steps

Remember Facts Across Chat Sessions with Automatic Memory Extraction: let the LLM extract and store facts from conversations automatically.
Build a Multi-Agent Workflow: coordinate multiple assistants in orchestrated workflows.
Create an AI Agent with Tools: give your assistant tool-calling capabilities.
Samples: Multi-Turn Chat: multi-turn chat demo.
Samples: Persistent Memory Assistant: memory assistant demo.

Table of Contents