👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/agents/persistent_memory_assistant
Persistent Memory Assistant for C# .NET Applications
Purpose of the Sample
Persistent Memory Assistant demonstrates how to use LM-Kit.NET to build an AI assistant with long-term memory that persists across conversation sessions using the AgentMemory system.
The sample shows how to:
- Create an AI agent with semantic memory using
AgentMemory. - Load and configure embedding models for memory retrieval.
- Store information in different memory types (Semantic, Episodic, Procedural).
- Persist memories to disk and reload them across sessions.
- Automatically extract and store facts from conversations.
- Use RAG-based retrieval to enhance responses with relevant memories.
Why Agent Memory with LM-Kit.NET?
- Personalization: the assistant remembers user preferences, projects, and context.
- Context continuity: conversations build on prior interactions across sessions.
- Local-first: all memory storage and retrieval runs on your hardware.
- Semantic search: find relevant memories based on meaning, not keywords.
- Flexible storage: save and load memories for persistence or transfer.
Target Audience
- Product Developers: build personalized AI assistants that learn user preferences.
- Enterprise Teams: create context-aware assistants for ongoing projects.
- CRM & Support: develop assistants that remember customer interactions.
- Personal Productivity: build AI companions that understand your workflow over time.
- AI/ML Engineers: explore RAG-based memory systems with local inference.
Problem Solved
- Context across sessions: assistant remembers information from previous conversations.
- Personalized responses: uses stored facts to provide tailored assistance.
- Organized memory: different memory types for facts, events, and preferences.
- Persistent storage: memories survive application restarts.
- Explicit and implicit learning: store memories via commands or automatic extraction.
Sample Application Description
Console app that:
- Lets you choose from 5 models suitable for conversational memory tasks.
- Loads both chat model and embedding model for memory operations.
- Creates an Agent with
AgentMemoryfor persistent context. - Loads existing memories from disk if available.
- Enters an interactive chat loop where you can:
- Chat naturally and share information about yourself.
- Use
/rememberto explicitly store information. - Use
/memoriesto view stored memory sources. - Use
/saveand/loadto manage persistence.
- Automatically extracts facts from conversations using pattern matching.
- Auto-saves memories on exit.
- Loops until you type
quitto exit.
Key Features
- AgentMemory Integration: RAG-based semantic memory storage and retrieval.
- Three Memory Types: Semantic (facts), Episodic (events), Procedural (preferences).
- Automatic Fact Extraction: detects and stores information from natural conversation.
- Disk Persistence: save/load memory state across application sessions.
- Command Interface: explicit control over memory operations.
- Embedding Model: dedicated model for semantic similarity search.
Built-In Models (menu)
On startup, the sample shows a model selection menu:
| Option | Model | Approx. VRAM Needed |
|---|---|---|
| 0 | Google Gemma 3 4B | ~4 GB VRAM |
| 1 | Microsoft Phi-4 Mini 3.8B | ~3.3 GB VRAM |
| 2 | Meta Llama 3.1 8B | ~6 GB VRAM |
| 3 | Alibaba Qwen-3 8B | ~5.6 GB VRAM |
| 4 | Microsoft Phi-4 14.7B | ~11 GB VRAM |
| other | Custom model URI | depends on model |
Additional model loaded automatically:
- Embedding model:
bge-m3- used for semantic memory retrieval.
Total VRAM usage is chat model + embedding model (~1.5 GB for bge-m3).
Supported Models
The sample works with any instruction-following model:
gemma3:4b- compact and efficientphi4-mini:3.8b- small footprintllama3.1:8b- general purposeqwen3:8b- strong multilingual supportphi4:14.7b- advanced reasoning
Embedding model (required):
bge-m3- multilingual embeddings for semantic search
Internally:
// Load chat model
LM model = new LM(new Uri(modelUri));
// Load embedding model for memory
LM embeddingModel = new LM(new Uri(EMBEDDING_MODEL_PATH));
// Create memory with embedding model
var memory = new AgentMemory(embeddingModel);
Commands & Flow
Interactive Commands
| Command | Description |
|---|---|
/remember <info> |
Explicitly store information in memory. |
/memories |
List all stored memory sources and their sections. |
/clear |
Clear all memories from the current session. |
/save |
Save memories to disk (./agent_memory.bin). |
/load |
Load memories from disk. |
/help |
Show all available commands. |
quit |
Exit the application (auto-saves memories). |
Conversation Flow
- Startup: Model selection and loading (chat + embedding).
- Memory Load: Attempts to load existing memories from disk.
- Chat Loop:
- User enters message or command.
- Commands are processed immediately.
- Messages trigger memory retrieval and response generation.
- New facts are automatically extracted and stored.
- Exit: Memories auto-save on quit.
Memory Types Explained
| Type | Purpose | Examples |
|---|---|---|
| Semantic | Facts and knowledge | "Alex works at TechCorp", "User prefers TypeScript" |
| Episodic | Events and experiences | "Discussed API design on Monday", "Project deadline next week" |
| Procedural | Preferences and processes | "Always format code in C#", "Prefers detailed explanations" |
Memory types are automatically assigned based on conversation patterns:
var patterns = new[]
{
("my name is ", MemoryType.Semantic),
("i work ", MemoryType.Semantic),
("i prefer ", MemoryType.Procedural),
("yesterday ", MemoryType.Episodic),
("my project ", MemoryType.Episodic),
};
Example Conversations
Building Memory Over Time
User: My name is Alex and I work as a software engineer at TechCorp.
Assistant: Nice to meet you, Alex! I'll remember that you're a software
engineer at TechCorp. What kind of projects do you work on?
[Stored new semantic memory]
User: I mainly work on backend services using C# and .NET.
Assistant: Got it! So you focus on backend development with C# and .NET
at TechCorp. That's a solid tech stack.
[Stored new semantic memory]
[Later or in a new session]
User: What do you know about me?
Assistant: Based on what you've told me, you're Alex, a software engineer
at TechCorp who specializes in backend services using C# and .NET.
Using Commands
User: /remember I prefer dark mode in all applications
Stored in memory: "I prefer dark mode in all applications"
User: /memories
Memory sources (2):
[Semantic] semantic_memories - 3 section(s)
[Procedural] procedural_memories - 1 section(s)
User: /save
Saved 2 memory sources to ./agent_memory.bin
Agent Configuration
using LMKit.Agents;
using LMKit.Model;
// Load models
LM chatModel = new LM(new Uri(chatModelUri));
LM embeddingModel = new LM(new Uri(embeddingModelUri));
// Create or load memory
var memory = File.Exists(MEMORY_FILE_PATH)
? AgentMemory.Deserialize(MEMORY_FILE_PATH, embeddingModel)
: new AgentMemory(embeddingModel);
// Build agent with memory
var agent = Agent.CreateBuilder(chatModel)
.WithPersona(@"You are a helpful personal assistant with persistent memory.
You remember information users share with you and use it to provide personalized assistance.
When users share personal information (name, preferences, projects, etc.):
- Acknowledge and confirm what you've learned
- Use this information naturally in future responses
When answering questions:
- Check if you have relevant memories that could help
- Reference past conversations when appropriate
- Be conversational and personable")
.WithPlanning(PlanningStrategy.None)
.WithMemory(memory)
.Build();
// Execute conversation
var executor = new AgentExecutor();
var result = executor.Execute(agent, userInput, cancellationToken);
// Store new information from conversation
memory.SaveInformation(
dataSourceId: "conversation_memories",
content: "User said: I prefer dark mode",
sectionId: $"conversation_{DateTime.Now.Ticks}",
memoryType: MemoryType.Procedural);
// Save to disk
memory.Serialize(MEMORY_FILE_PATH);
Architecture
┌─────────────────────────────────────────────────┐
│ User Message │
└─────────────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Memory Retrieval │
│ (Semantic search for relevant memories) │
└─────────────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Context Enhancement │
│ (Inject relevant memories into prompt) │
└─────────────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Agent Response │
└─────────────────────┬───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Memory Storage │
│ (Store new facts from conversation) │
└─────────────────────────────────────────────────┘
Memory Storage Location
Memories are saved to: ./agent_memory.bin
The binary format includes:
- All data sources with their sections
- Embedded vectors for semantic search
- Memory type classifications
Behavior & Policies
- Model loading: requires both chat model and embedding model.
- Memory retrieval: automatic RAG-based context enhancement per query.
- Fact extraction: pattern-based detection of information worth storing.
- Auto-save: memories automatically saved on graceful exit.
- Memory isolation: each session can have independent memory instances.
- Licensing: set an optional license key via
LicenseManager.SetLicenseKey("").
Getting Started
Prerequisites
- .NET 8.0 or later
- Sufficient VRAM for chat model + embedding model (~5-12 GB total)
Download
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/agents/persistent_memory_assistant
Run
dotnet build
dotnet run
Then:
- Select a model by typing 0-4, or paste a custom model URI.
- Wait for models to download (first run) and load.
- Chat naturally and share information about yourself.
- Use commands to manage memories explicitly.
- Memories auto-load on restart.
- Type
quitto exit (memories auto-save).
Troubleshooting
"No memories stored yet"
- Share some information in conversation first.
- Use
/remember <info>to store explicitly.
Memory not persisting
- Ensure you exit with
quitfor auto-save. - Use
/saveto save manually. - Check write permissions for
./agent_memory.bin.
- Ensure you exit with
Slow memory retrieval
- Embedding model needs to be loaded (adds ~1.5 GB VRAM).
- First query may be slower as embeddings are computed.
Out-of-memory errors
- Total VRAM = chat model + embedding model.
- Pick a smaller chat model if needed.
Assistant doesn't remember
- Information may not match extraction patterns.
- Use
/rememberto store explicitly. - Check
/memoriesto see what's stored.
Extend the Demo
- Custom extraction: use LLM to extract entities instead of patterns.
- Memory importance: add relevance scoring and memory decay.
- Memory categories: create domain-specific memory types.
- Cloud persistence: store memories in databases or cloud storage.
- Memory sharing: transfer memories between agents or sessions.
- Selective forgetting: implement commands to remove specific memories.