Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/agents/persistent_memory_assistant

Persistent Memory Assistant for C# .NET Applications


Purpose of the Sample

Persistent Memory Assistant demonstrates how to use LM-Kit.NET to build an AI assistant with long-term memory that persists across conversation sessions using the AgentMemory system.

The sample shows how to:

  • Create an AI agent with semantic memory using AgentMemory.
  • Load and configure embedding models for memory retrieval.
  • Store information in different memory types (Semantic, Episodic, Procedural).
  • Persist memories to disk and reload them across sessions.
  • Automatically extract and store facts from conversations.
  • Use RAG-based retrieval to enhance responses with relevant memories.

Why Agent Memory with LM-Kit.NET?

  • Personalization: the assistant remembers user preferences, projects, and context.
  • Context continuity: conversations build on prior interactions across sessions.
  • Local-first: all memory storage and retrieval runs on your hardware.
  • Semantic search: find relevant memories based on meaning, not keywords.
  • Flexible storage: save and load memories for persistence or transfer.

Target Audience

  • Product Developers: build personalized AI assistants that learn user preferences.
  • Enterprise Teams: create context-aware assistants for ongoing projects.
  • CRM & Support: develop assistants that remember customer interactions.
  • Personal Productivity: build AI companions that understand your workflow over time.
  • AI/ML Engineers: explore RAG-based memory systems with local inference.

Problem Solved

  • Context across sessions: assistant remembers information from previous conversations.
  • Personalized responses: uses stored facts to provide tailored assistance.
  • Organized memory: different memory types for facts, events, and preferences.
  • Persistent storage: memories survive application restarts.
  • Explicit and implicit learning: store memories via commands or automatic extraction.

Sample Application Description

Console app that:

  • Lets you choose from 5 models suitable for conversational memory tasks.
  • Loads both chat model and embedding model for memory operations.
  • Creates an Agent with AgentMemory for persistent context.
  • Loads existing memories from disk if available.
  • Enters an interactive chat loop where you can:
    • Chat naturally and share information about yourself.
    • Use /remember to explicitly store information.
    • Use /memories to view stored memory sources.
    • Use /save and /load to manage persistence.
  • Automatically extracts facts from conversations using pattern matching.
  • Auto-saves memories on exit.
  • Loops until you type quit to exit.

Key Features

  • AgentMemory Integration: RAG-based semantic memory storage and retrieval.
  • Three Memory Types: Semantic (facts), Episodic (events), Procedural (preferences).
  • Automatic Fact Extraction: detects and stores information from natural conversation.
  • Disk Persistence: save/load memory state across application sessions.
  • Command Interface: explicit control over memory operations.
  • Embedding Model: dedicated model for semantic similarity search.

Built-In Models (menu)

On startup, the sample shows a model selection menu:

Option Model Approx. VRAM Needed
0 Google Gemma 3 4B ~4 GB VRAM
1 Microsoft Phi-4 Mini 3.8B ~3.3 GB VRAM
2 Meta Llama 3.1 8B ~6 GB VRAM
3 Alibaba Qwen-3 8B ~5.6 GB VRAM
4 Microsoft Phi-4 14.7B ~11 GB VRAM
other Custom model URI depends on model

Additional model loaded automatically:

  • Embedding model: bge-m3 - used for semantic memory retrieval.

Total VRAM usage is chat model + embedding model (~1.5 GB for bge-m3).


Supported Models

The sample works with any instruction-following model:

  • gemma3:4b - compact and efficient
  • phi4-mini:3.8b - small footprint
  • llama3.1:8b - general purpose
  • qwen3:8b - strong multilingual support
  • phi4:14.7b - advanced reasoning

Embedding model (required):

  • bge-m3 - multilingual embeddings for semantic search

Internally:

// Load chat model
LM model = new LM(new Uri(modelUri));

// Load embedding model for memory
LM embeddingModel = new LM(new Uri(EMBEDDING_MODEL_PATH));

// Create memory with embedding model
var memory = new AgentMemory(embeddingModel);

Commands & Flow

Interactive Commands

Command Description
/remember <info> Explicitly store information in memory.
/memories List all stored memory sources and their sections.
/clear Clear all memories from the current session.
/save Save memories to disk (./agent_memory.bin).
/load Load memories from disk.
/help Show all available commands.
quit Exit the application (auto-saves memories).

Conversation Flow

  1. Startup: Model selection and loading (chat + embedding).
  2. Memory Load: Attempts to load existing memories from disk.
  3. Chat Loop:
    • User enters message or command.
    • Commands are processed immediately.
    • Messages trigger memory retrieval and response generation.
    • New facts are automatically extracted and stored.
  4. Exit: Memories auto-save on quit.

Memory Types Explained

Type Purpose Examples
Semantic Facts and knowledge "Alex works at TechCorp", "User prefers TypeScript"
Episodic Events and experiences "Discussed API design on Monday", "Project deadline next week"
Procedural Preferences and processes "Always format code in C#", "Prefers detailed explanations"

Memory types are automatically assigned based on conversation patterns:

var patterns = new[]
{
    ("my name is ", MemoryType.Semantic),
    ("i work ", MemoryType.Semantic),
    ("i prefer ", MemoryType.Procedural),
    ("yesterday ", MemoryType.Episodic),
    ("my project ", MemoryType.Episodic),
};

Example Conversations

Building Memory Over Time

User: My name is Alex and I work as a software engineer at TechCorp.
Assistant: Nice to meet you, Alex! I'll remember that you're a software
engineer at TechCorp. What kind of projects do you work on?
[Stored new semantic memory]

User: I mainly work on backend services using C# and .NET.
Assistant: Got it! So you focus on backend development with C# and .NET
at TechCorp. That's a solid tech stack.
[Stored new semantic memory]

[Later or in a new session]

User: What do you know about me?
Assistant: Based on what you've told me, you're Alex, a software engineer
at TechCorp who specializes in backend services using C# and .NET.

Using Commands

User: /remember I prefer dark mode in all applications
Stored in memory: "I prefer dark mode in all applications"

User: /memories
Memory sources (2):
  [Semantic] semantic_memories - 3 section(s)
  [Procedural] procedural_memories - 1 section(s)

User: /save
Saved 2 memory sources to ./agent_memory.bin

Agent Configuration

using LMKit.Agents;
using LMKit.Model;

// Load models
LM chatModel = new LM(new Uri(chatModelUri));
LM embeddingModel = new LM(new Uri(embeddingModelUri));

// Create or load memory
var memory = File.Exists(MEMORY_FILE_PATH)
    ? AgentMemory.Deserialize(MEMORY_FILE_PATH, embeddingModel)
    : new AgentMemory(embeddingModel);

// Build agent with memory
var agent = Agent.CreateBuilder(chatModel)
    .WithPersona(@"You are a helpful personal assistant with persistent memory.
You remember information users share with you and use it to provide personalized assistance.

When users share personal information (name, preferences, projects, etc.):
- Acknowledge and confirm what you've learned
- Use this information naturally in future responses

When answering questions:
- Check if you have relevant memories that could help
- Reference past conversations when appropriate
- Be conversational and personable")
    .WithPlanning(PlanningStrategy.None)
    .WithMemory(memory)
    .Build();

// Execute conversation
var executor = new AgentExecutor();
var result = executor.Execute(agent, userInput, cancellationToken);

// Store new information from conversation
memory.SaveInformation(
    dataSourceId: "conversation_memories",
    content: "User said: I prefer dark mode",
    sectionId: $"conversation_{DateTime.Now.Ticks}",
    memoryType: MemoryType.Procedural);

// Save to disk
memory.Serialize(MEMORY_FILE_PATH);

Architecture

┌─────────────────────────────────────────────────┐
│                User Message                      │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│              Memory Retrieval                    │
│    (Semantic search for relevant memories)       │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│              Context Enhancement                 │
│    (Inject relevant memories into prompt)        │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│                Agent Response                    │
└─────────────────────┬───────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────┐
│              Memory Storage                      │
│    (Store new facts from conversation)           │
└─────────────────────────────────────────────────┘

Memory Storage Location

Memories are saved to: ./agent_memory.bin

The binary format includes:

  • All data sources with their sections
  • Embedded vectors for semantic search
  • Memory type classifications

Behavior & Policies

  • Model loading: requires both chat model and embedding model.
  • Memory retrieval: automatic RAG-based context enhancement per query.
  • Fact extraction: pattern-based detection of information worth storing.
  • Auto-save: memories automatically saved on graceful exit.
  • Memory isolation: each session can have independent memory instances.
  • Licensing: set an optional license key via LicenseManager.SetLicenseKey("").

Getting Started

Prerequisites

  • .NET 8.0 or later
  • Sufficient VRAM for chat model + embedding model (~5-12 GB total)

Download

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/agents/persistent_memory_assistant

Run

dotnet build
dotnet run

Then:

  1. Select a model by typing 0-4, or paste a custom model URI.
  2. Wait for models to download (first run) and load.
  3. Chat naturally and share information about yourself.
  4. Use commands to manage memories explicitly.
  5. Memories auto-load on restart.
  6. Type quit to exit (memories auto-save).

Troubleshooting

  • "No memories stored yet"

    • Share some information in conversation first.
    • Use /remember <info> to store explicitly.
  • Memory not persisting

    • Ensure you exit with quit for auto-save.
    • Use /save to save manually.
    • Check write permissions for ./agent_memory.bin.
  • Slow memory retrieval

    • Embedding model needs to be loaded (adds ~1.5 GB VRAM).
    • First query may be slower as embeddings are computed.
  • Out-of-memory errors

    • Total VRAM = chat model + embedding model.
    • Pick a smaller chat model if needed.
  • Assistant doesn't remember

    • Information may not match extraction patterns.
    • Use /remember to store explicitly.
    • Check /memories to see what's stored.

Extend the Demo

  • Custom extraction: use LLM to extract entities instead of patterns.
  • Memory importance: add relevance scoring and memory decay.
  • Memory categories: create domain-specific memory types.
  • Cloud persistence: store memories in databases or cloud storage.
  • Memory sharing: transfer memories between agents or sessions.
  • Selective forgetting: implement commands to remove specific memories.

Additional Resources