Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/agents/persistent_memory_assistant

Persistent Memory Assistant for C# .NET Applications


🎯 Purpose of the Demo

Persistent Memory Assistant demonstrates how to use LM-Kit.NET to build an AI assistant with long-term memory that persists across conversation sessions using the AgentMemory system with built-in automatic memory extraction.

The sample shows how to:

  • Enable LLM-based memory extraction via MemoryExtractionMode.LlmBased on AgentMemory.
  • Automatically extract and store facts from conversations using grammar-constrained structured output.
  • Load and configure embedding models for memory retrieval.
  • Persist memories to disk and reload them across sessions.
  • Subscribe to the BeforeMemoryStored event to inspect or filter extracted memories.
  • Use RAG-based retrieval to enhance responses with relevant memories.

Why Agent Memory with LM-Kit.NET?

  • Automatic learning: the agent extracts facts, preferences, and context from every conversation turn without custom code.
  • Personalization: the assistant remembers user preferences, projects, and context.
  • Context continuity: conversations build on prior interactions across sessions.
  • Local-first: all memory storage, extraction, and retrieval runs on your hardware.
  • Semantic search: find relevant memories based on meaning, not keywords.
  • Deduplication: duplicate facts are automatically detected and skipped.

👥 Target Audience

  • Product Developers: build personalized AI assistants that learn user preferences.
  • Enterprise Teams: create context-aware assistants for ongoing projects.
  • CRM & Support: develop assistants that remember customer interactions.
  • Personal Productivity: build AI companions that understand your workflow over time.
  • AI/ML Engineers: explore RAG-based memory systems with local inference.

🚀 What Problem It Solves

  • Context across sessions: assistant remembers information from previous conversations.
  • Personalized responses: uses stored facts to provide tailored assistance.
  • Organized memory: different memory types for facts, events, and preferences.
  • Persistent storage: memories survive application restarts.
  • Zero-code extraction: no custom pattern matching or regex needed.

💻 Demo Application Overview

Console app that:

  • Lets you choose from 7 models suitable for conversational memory tasks.
  • Loads both chat model and embedding model for memory operations.
  • Creates an Agent with AgentMemory configured for automatic extraction.
  • Loads existing memories from disk if available.
  • Enters an interactive chat loop where you can:
    • Chat naturally and share information about yourself.
    • Use /new to start a fresh chat session while keeping all memories.
    • Use /remember to explicitly store information.
    • Use /memories to view stored memory sources.
    • Use /save and /load to manage persistence.
  • Automatically extracts facts from conversations using the LLM with structured output.
  • Auto-saves memories on exit.
  • Loops until you type quit to exit.

✨ Key Features

  • Built-in LLM Extraction: automatic fact extraction using MemoryExtractionMode.LlmBased with no custom code.
  • Three Memory Types: Semantic (facts), Episodic (events), Procedural (preferences), auto-classified by the LLM.
  • Deduplication: similar memories are detected and skipped before storage.
  • Capacity Limits and Eviction: configurable MaxMemoryEntries with automatic eviction of oldest or least important entries when the limit is reached.
  • Memory Consolidation: merge similar memories into concise summaries via /consolidate, using LLM-powered summarization with configurable similarity threshold.
  • Conversation Summarization: summarize the current session into episodic memory via /summarize, capturing key topics and decisions for future recall.
  • Automatic Timestamps: every memory entry receives a created_at timestamp for eviction ordering and time-decay scoring.
  • Time-Decay Scoring: configurable TimeDecayHalfLife so recent memories rank higher during retrieval.
  • BeforeMemoryStored Event: inspect, modify, or cancel extracted memories before they are stored.
  • MemoryEvicted Event: monitor or cancel evictions when capacity is exceeded.
  • Disk Persistence: save/load memory state across application sessions.
  • Command Interface: explicit control over memory operations, including /capacity to set limits at runtime.
  • Embedding Model: dedicated model for semantic similarity search.

Built-In Models (menu)

On startup, the sample shows a model selection menu:

Option Model Approx. VRAM Needed
0 Google Gemma 3 12B ~9 GB VRAM
1 Microsoft Phi-4 Mini 3.8B ~3.3 GB VRAM
2 Meta Llama 3.1 8B ~6 GB VRAM
3 Alibaba Qwen-3 8B ~5.6 GB VRAM
4 Microsoft Phi-4 14.7B ~11 GB VRAM
5 Open AI GPT OSS 20B ~16 GB VRAM
6 Z.ai GLM 4.7 Flash 30B ~18 GB VRAM
other Custom model URI depends on model

Additional model loaded automatically:

  • Embedding model: embeddinggemma-300m for semantic memory retrieval.

Total VRAM usage is chat model + embedding model (~300 MB for embeddinggemma-300m).


Commands & Flow

Interactive Commands

Command Description
/new Start a new chat session (clears conversation history, keeps memories).
/remember <info> Explicitly store information in memory.
/memories List all stored memory sources and their sections.
/capacity [n] Show or set maximum memory entries (0 for unlimited).
/decay [days] Show or set time-decay half-life in days (0 to disable).
/clear Clear all memories from the current session.
/save Save memories to disk (./agent_memory.bin).
/load Load memories from disk.
/consolidate [threshold] Merge similar memories into consolidated entries. Optional similarity threshold (default: 0.85).
/summarize Summarize the current conversation and store it as a memory.
/help Show all available commands.
quit Exit the application (auto-saves memories).

Conversation Flow

  1. Startup: Model selection and loading (chat + embedding).
  2. Memory Load: Attempts to load existing memories from disk.
  3. Chat Loop:
    • User enters message or command.
    • Commands are processed immediately.
    • Messages trigger memory retrieval and response generation.
    • New facts are automatically extracted by the LLM and stored (with deduplication).
  4. Exit: Memories auto-save on quit.

Memory Types Explained

Type Purpose Examples
Semantic Facts and knowledge "Alex works at TechCorp", "User prefers TypeScript"
Episodic Events and experiences "Discussed API design on Monday", "Project deadline next week"
Procedural Preferences and processes "Always format code in C#", "Prefers detailed explanations"

Memory types are automatically classified by the LLM during extraction. Each extracted fact includes a type, importance level, and category.


Example Conversations

Building Memory Over Time

User: My name is Alex and I work as a software engineer at TechCorp.
  [Memory extracted: The user's name is Alex (Semantic, High)]
  [Memory extracted: The user works as a software engineer at TechCorp (Semantic, Medium)]
Assistant: Nice to meet you, Alex! I'll remember that you're a software
engineer at TechCorp. What kind of projects do you work on?

User: I mainly work on backend services using C# and .NET.
  [Memory extracted: The user works on backend services using C# and .NET (Semantic, Medium)]
Assistant: Got it! So you focus on backend development with C# and .NET
at TechCorp. That's a solid tech stack.

User: /new
Started a new chat session. Conversation history cleared, memories preserved.

User: What do you know about me?
Assistant: Based on what I recall, you're Alex, a software engineer
at TechCorp who specializes in backend services using C# and .NET.

Using Commands

User: /remember I prefer dark mode in all applications
Stored in memory: "I prefer dark mode in all applications"

User: /memories
Memory sources (2):
  [Semantic] semantic_memories - 3 section(s)
  [Semantic] user_memories - 1 section(s)

User: /save
Saved 2 memory sources to ./agent_memory.bin

Agent Configuration

using LMKit.Agents;
using LMKit.Agents.Memory;
using LMKit.Model;

// Load models
LM chatModel = new LM(new Uri(chatModelUri));
LM embeddingModel = new LM(new Uri(embeddingModelUri));

// Create or load memory
var memory = File.Exists(MEMORY_FILE_PATH)
    ? AgentMemory.Deserialize(MEMORY_FILE_PATH, embeddingModel)
    : new AgentMemory(embeddingModel);

// Enable automatic LLM-based memory extraction
memory.ExtractionMode = MemoryExtractionMode.LlmBased;
memory.RunExtractionSynchronously = true;

// Optional: inspect what gets extracted
memory.BeforeMemoryStored += (sender, args) =>
{
    foreach (var mem in args.Memories)
        Console.WriteLine($"Extracted: {mem.Text} ({mem.MemoryType})");
};

// Build agent with memory
var agent = Agent.CreateBuilder(chatModel)
    .WithPersona("You are a helpful, conversational personal assistant.")
    .WithPlanning(PlanningStrategy.None)
    .WithMemory(memory)
    .Build();

// Execute conversation
var executor = new AgentExecutor();
var result = executor.Execute(agent, userInput);

// Save to disk
memory.Serialize(MEMORY_FILE_PATH);

🏗️ Architecture

+--------------------------------------------------+
|                User Message                       |
+-------------------------+------------------------+
                          |
                          v
+--------------------------------------------------+
|              Memory Retrieval                     |
|    (Semantic search for relevant memories)        |
+-------------------------+------------------------+
                          |
                          v
+--------------------------------------------------+
|              Context Enhancement                  |
|    (Inject relevant memories into prompt)         |
+-------------------------+------------------------+
                          |
                          v
+--------------------------------------------------+
|                Agent Response                     |
+-------------------------+------------------------+
                          |
                          v
+--------------------------------------------------+
|         Automatic Memory Extraction               |
|  (LLM-based structured extraction + dedup)        |
+--------------------------------------------------+

Memory Storage Location

Memories are saved to: ./agent_memory.bin

The binary format includes:

  • All data sources with their sections
  • Embedded vectors for semantic search
  • Memory type classifications

Behavior & Policies

  • Model loading: requires both chat model and embedding model.
  • Memory retrieval: automatic RAG-based context enhancement per query.
  • Fact extraction: LLM-based with grammar-constrained structured output.
  • Deduplication: extracted memories are checked against existing memory via vector similarity.
  • Auto-save: memories automatically saved on graceful exit.
  • Memory isolation: each session can have independent memory instances.
  • Licensing: set an optional license key via LicenseManager.SetLicenseKey("").

⚙️ Getting Started

Prerequisites

  • .NET 8.0 or later
  • Sufficient VRAM for chat model + embedding model (~4-19 GB total)

Download

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/agents/persistent_memory_assistant

Run

dotnet build
dotnet run

Then:

  1. Select a model by typing 0-6, or paste a custom model URI.
  2. Wait for models to download (first run) and load.
  3. Chat naturally and share information about yourself.
  4. Use commands to manage memories explicitly.
  5. Memories auto-load on restart.
  6. Type quit to exit (memories auto-save).

🔧 Troubleshooting

  • "No memories stored yet"

    • Share some information in conversation first.
    • Use /remember <info> to store explicitly.
  • Memory not persisting

    • Ensure you exit with quit for auto-save.
    • Use /save to save manually.
    • Check write permissions for ./agent_memory.bin.
  • Slow memory retrieval

    • Embedding model needs to be loaded (adds ~300 MB VRAM).
    • First query may be slower as embeddings are computed.
  • Out-of-memory errors

    • Total VRAM = chat model + embedding model.
    • Pick a smaller chat model if needed.
  • Assistant doesn't remember

    • Check /memories to see what's stored.
    • Use /remember to store explicitly.
    • Set RunExtractionSynchronously = true to ensure extraction completes before the next turn.

🚀 Extend the Demo

  • Custom extraction model: set ExtractionModel to a lightweight model for faster extraction.
  • Custom guidance: set ExtractionPrompt to focus extraction on specific domains.
  • Memory filtering: use BeforeMemoryStored to apply custom business rules.
  • Deduplication tuning: adjust DeduplicationThreshold for stricter or looser duplicate detection.
  • Cloud persistence: store memories in databases or cloud storage.
  • Memory sharing: transfer memories between agents or sessions.

📚 Additional Resources

Share