Table of Contents

Remember Facts Across Chat Sessions with Automatic Memory Extraction

By default, when a user tells a chatbot "My name is David" in one session, that information is lost when the session ends. AgentMemory with automatic extraction solves this: the system analyzes each conversation turn, identifies facts worth remembering, and stores them as searchable vectors. In the next session, when the user asks "What is my name?", the memory is recalled automatically and the model answers "David".

This guide shows how to enable built-in LLM-based memory extraction so your MultiTurnConversation remembers information across independent sessions with no manual extraction code.

For background on manual memory population and memory types, see Use Agent Memory for Long-Term Knowledge Across Sessions.


Why This Matters

Two problems that automatic memory extraction solves:

  1. Users should not repeat themselves. A customer says "I use .NET 8 on Linux" in session one. In session two, the assistant should know this without being told again. Automatic extraction captures these facts as they appear in conversation.
  2. No custom extraction code needed. Without built-in extraction, developers must write pattern matching or post-processing logic to decide what to store. The LLM handles this out of the box, classifying each fact by type and importance.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM ~300 MB (embedding) + 3+ GB (chat model)

You need two models: an embedding model (for indexing and searching memories) and a chat model (for conversation and extraction).


Step 1: Create the Project

dotnet new console -n MemoryExtractionDemo
cd MemoryExtractionDemo
dotnet add package LM-Kit.NET

Step 2: Load Models

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Memory;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// Load embedding model for memory indexing and search
Console.WriteLine("Loading embedding model...");
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m",
    loadingProgress: p => { Console.Write($"\rLoading embeddings: {p * 100:F0}%   "); return true; });
Console.WriteLine();

// Load chat model for conversation and extraction
Console.WriteLine("Loading chat model...");
using LM chatModel = LM.LoadFromModelID("gemma3:4b",
    loadingProgress: p => { Console.Write($"\rLoading chat: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

Step 3: Create Memory with Automatic Extraction

var memory = new AgentMemory(embeddingModel);

// Enable automatic extraction: the LLM analyzes each turn and stores relevant facts
memory.ExtractionMode = MemoryExtractionMode.LlmBased;

// Wait for extraction to complete before proceeding to the next turn
memory.RunExtractionSynchronously = true;

Two key properties:

Property Default Purpose
ExtractionMode None Set to LlmBased to enable automatic extraction
RunExtractionSynchronously false When true, extraction completes before the response is returned. When false, extraction runs in the background (fire and forget).

Step 4: Session 1, Introduce Yourself

Create a MultiTurnConversation, assign memory, and have a conversation. The extraction happens automatically after each agent execution.

// Build agent with memory
var agent = Agent.CreateBuilder(chatModel)
    .WithMemory(memory)
    .WithPlanning(PlanningStrategy.None)
    .Build();

var executor = new AgentExecutor();

// Session 1: user shares personal information
Console.WriteLine("=== Session 1 ===\n");
var result = executor.Execute(agent, "My name is David and I work on backend services with C#.");
Console.WriteLine($"Assistant: {result.Content}\n");

// Check what was extracted
Console.WriteLine($"Memory now contains {memory.DataSources.Count} data source(s):");
foreach (var ds in memory.DataSources)
    Console.WriteLine($"  [{AgentMemory.GetMemoryType(ds)}] {ds.Identifier}: {ds.Sections.Count()} section(s)");

Behind the scenes, the system:

  1. Sends the user message and assistant response to the LLM with a structured extraction schema.
  2. The LLM identifies facts (e.g., "The user's name is David", "The user works on backend services with C#").
  3. Each fact is classified by MemoryType (Semantic, Episodic, Procedural) and MemoryImportance (Low, Medium, High).
  4. Facts are deduplicated against existing memory via vector similarity.
  5. New facts are stored in AgentMemory.

Step 5: Persist Memory to Disk

// Serialize memory (simulating end of session)
string memoryPath = "conversation_memory.bin";
memory.Serialize(memoryPath);
Console.WriteLine($"\nMemory saved to {memoryPath}");

Step 6: Session 2, Recall in a New Conversation

Load the serialized memory and attach it to a fresh MultiTurnConversation. The user's name is recalled automatically via RAG.

// Restore memory from disk
AgentMemory restoredMemory = AgentMemory.Deserialize(memoryPath, embeddingModel);

// New conversation with restored memory
Console.WriteLine("\n=== Session 2 (new conversation) ===\n");

using var conversation = new MultiTurnConversation(chatModel)
{
    SamplingMode = new GreedyDecoding(),
    Memory = restoredMemory
};

var response = conversation.Submit("What is my name?");
Console.WriteLine($"Assistant: {response.Completion}");
// Expected output contains "David"

The MultiTurnConversation has no chat history from session 1. It only has the restored AgentMemory. When the user asks "What is my name?", the system searches memory for relevant facts, finds "The user's name is David", injects it into the context, and the model responds with "David".


Step 7: Inspect and Filter Extracted Memories

Subscribe to BeforeMemoryStored to see what the LLM extracts and optionally cancel specific entries.

memory.BeforeMemoryStored += (sender, args) =>
{
    foreach (var mem in args.Memories)
    {
        Console.WriteLine($"  Extracted: {mem.Text} [{mem.MemoryType}, {mem.Importance}]");

        // Cancel low-importance memories
        if (mem.Importance == MemoryImportance.Low)
            mem.Cancel = true;
    }
};

MemoryExtractionEventArgs properties:

Property Type Purpose
Memories IReadOnlyList<ExtractedMemory> The extracted facts
UserMessage string The user message that was analyzed
AssistantResponse string The assistant response that was analyzed
CancelAll bool Set true to skip all extracted memories

Each ExtractedMemory exposes:

Property Type Purpose
Text string The extracted fact
MemoryType MemoryType Semantic, Episodic, or Procedural
Importance MemoryImportance Low, Medium, or High
Category string Label (e.g., personal_info, preference, project)
Cancel bool Set true to skip this specific memory

Step 8: Configure Extraction Behavior

Fine-tune how extraction works:

// Use a separate lightweight model for extraction (saves VRAM on the chat model)
memory.ExtractionModel = extractionModel;

// Custom guidance to focus extraction on specific domains
memory.ExtractionPrompt =
    "Only extract facts about user identity, preferences, and technical stack.\n" +
    "Ignore greetings and small talk.";

// Control how many facts are stored per conversation turn
memory.MaxExtractionsPerTurn = 3;

// Adjust deduplication sensitivity (0.0 = no dedup, 1.0 = exact match only)
memory.DeduplicationThreshold = 0.85f;

// Maximum tokens for the extraction LLM call
memory.MaxExtractionCompletionTokens = 512;
Property Default Purpose
ExtractionModel null (uses agent's chat model) Separate model for extraction
ExtractionPrompt Built-in guidance Custom rules for what to extract
MaxExtractionsPerTurn 5 Maximum facts stored per turn
DeduplicationThreshold 0.85 Vector similarity threshold for duplicate detection
MaxExtractionCompletionTokens 512 Token budget for the extraction call

Step 9: Set Capacity Limits

When extraction runs continuously, memory can grow unbounded. Use MaxMemoryEntries to cap it and EvictionPolicy to control what gets removed.

using LMKit.Agents.Memory;

// Keep at most 200 extracted memories; evict least important first
memory.MaxMemoryEntries = 200;
memory.EvictionPolicy = MemoryEvictionPolicy.LowestImportanceFirst;

// Monitor evictions
memory.MemoryEvicted += (sender, e) =>
{
    Console.WriteLine($"Evicted: \"{e.Text}\" ({e.CreatedAt})");
};

Every extracted memory receives a created_at timestamp and an importance metadata value. The eviction engine uses these to decide which entries to remove first.


Step 10: Consolidate Redundant Memories

Automatic extraction can produce overlapping entries over many sessions. ConsolidateAsync clusters similar entries and merges each cluster into one consolidated fact using an LLM.

// Periodically consolidate to keep memory lean
var result = await memory.ConsolidateAsync(chatModel);
Console.WriteLine($"Merged {result.ClustersMerged} cluster(s), entries: {result.EntryCountBefore} -> {result.EntryCountAfter}");

Set ConsolidationSimilarityThreshold (default 0.7) to control how aggressively entries are grouped.


Step 11: Summarize Conversations on Session End

Call SummarizeConversationAsync before disposing a conversation to capture the session highlights as episodic memory entries.

// At the end of a session
var summaryResult = await memory.SummarizeConversationAsync(executor.ChatHistory, chatModel);
Console.WriteLine($"Stored {summaryResult.EntriesCreated} episode(s) from {summaryResult.MessagePairsSummarized} exchanges.");

Complete Example

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Memory;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

string memoryPath = "conversation_memory.bin";

// Load models
using LM embeddingModel = LM.LoadFromModelID("embeddinggemma-300m");
using LM chatModel = LM.LoadFromModelID("gemma3:4b");

// Create or restore memory
AgentMemory memory;
if (File.Exists(memoryPath))
{
    memory = AgentMemory.Deserialize(memoryPath, embeddingModel);
    Console.WriteLine($"Restored {memory.DataSources.Count} memory source(s) from disk.\n");
}
else
{
    memory = new AgentMemory(embeddingModel);
}

// Enable automatic extraction
memory.ExtractionMode = MemoryExtractionMode.LlmBased;
memory.RunExtractionSynchronously = true;

// Build agent
var agent = Agent.CreateBuilder(chatModel)
    .WithMemory(memory)
    .WithPlanning(PlanningStrategy.None)
    .Build();

var executor = new AgentExecutor();

// Chat loop
while (true)
{
    Console.Write("You: ");
    string? input = Console.ReadLine();

    if (string.IsNullOrWhiteSpace(input))
        continue;

    if (input.Equals("quit", StringComparison.OrdinalIgnoreCase))
    {
        memory.Serialize(memoryPath);
        Console.WriteLine("Memory saved. Goodbye!");
        break;
    }

    var result = executor.Execute(agent, input);
    Console.WriteLine($"Assistant: {result.Content}\n");
}

Run this, say "My name is David", type quit, run again, and ask "What is my name?". The assistant replies with "David" despite being a fresh session.