👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/agents/persistent_memory_assistant
Persistent Memory Assistant for C# .NET Applications
🎯 Purpose of the Demo
Persistent Memory Assistant demonstrates how to use LM-Kit.NET to build an AI assistant with long-term memory that persists across conversation sessions using the AgentMemory system with built-in automatic memory extraction.
The sample shows how to:
- Enable LLM-based memory extraction via
MemoryExtractionMode.LlmBasedonAgentMemory. - Automatically extract and store facts from conversations using grammar-constrained structured output.
- Load and configure embedding models for memory retrieval.
- Persist memories to disk and reload them across sessions.
- Subscribe to the
BeforeMemoryStoredevent to inspect or filter extracted memories. - Use RAG-based retrieval to enhance responses with relevant memories.
Why Agent Memory with LM-Kit.NET?
- Automatic learning: the agent extracts facts, preferences, and context from every conversation turn without custom code.
- Personalization: the assistant remembers user preferences, projects, and context.
- Context continuity: conversations build on prior interactions across sessions.
- Local-first: all memory storage, extraction, and retrieval runs on your hardware.
- Semantic search: find relevant memories based on meaning, not keywords.
- Deduplication: duplicate facts are automatically detected and skipped.
👥 Target Audience
- Product Developers: build personalized AI assistants that learn user preferences.
- Enterprise Teams: create context-aware assistants for ongoing projects.
- CRM & Support: develop assistants that remember customer interactions.
- Personal Productivity: build AI companions that understand your workflow over time.
- AI/ML Engineers: explore RAG-based memory systems with local inference.
🚀 What Problem It Solves
- Context across sessions: assistant remembers information from previous conversations.
- Personalized responses: uses stored facts to provide tailored assistance.
- Organized memory: different memory types for facts, events, and preferences.
- Persistent storage: memories survive application restarts.
- Zero-code extraction: no custom pattern matching or regex needed.
💻 Demo Application Overview
Console app that:
- Lets you choose from 7 models suitable for conversational memory tasks.
- Loads both chat model and embedding model for memory operations.
- Creates an Agent with
AgentMemoryconfigured for automatic extraction. - Loads existing memories from disk if available.
- Enters an interactive chat loop where you can:
- Chat naturally and share information about yourself.
- Use
/newto start a fresh chat session while keeping all memories. - Use
/rememberto explicitly store information. - Use
/memoriesto view stored memory sources. - Use
/saveand/loadto manage persistence.
- Automatically extracts facts from conversations using the LLM with structured output.
- Auto-saves memories on exit.
- Loops until you type
quitto exit.
✨ Key Features
- Built-in LLM Extraction: automatic fact extraction using
MemoryExtractionMode.LlmBasedwith no custom code. - Three Memory Types: Semantic (facts), Episodic (events), Procedural (preferences), auto-classified by the LLM.
- Deduplication: similar memories are detected and skipped before storage.
- Capacity Limits and Eviction: configurable
MaxMemoryEntrieswith automatic eviction of oldest or least important entries when the limit is reached. - Memory Consolidation: merge similar memories into concise summaries via
/consolidate, using LLM-powered summarization with configurable similarity threshold. - Conversation Summarization: summarize the current session into episodic memory via
/summarize, capturing key topics and decisions for future recall. - Automatic Timestamps: every memory entry receives a
created_attimestamp for eviction ordering and time-decay scoring. - Time-Decay Scoring: configurable
TimeDecayHalfLifeso recent memories rank higher during retrieval. - BeforeMemoryStored Event: inspect, modify, or cancel extracted memories before they are stored.
- MemoryEvicted Event: monitor or cancel evictions when capacity is exceeded.
- Disk Persistence: save/load memory state across application sessions.
- Command Interface: explicit control over memory operations, including
/capacityto set limits at runtime. - Embedding Model: dedicated model for semantic similarity search.
Built-In Models (menu)
On startup, the sample shows a model selection menu:
| Option | Model | Approx. VRAM Needed |
|---|---|---|
| 0 | Google Gemma 3 12B | ~9 GB VRAM |
| 1 | Microsoft Phi-4 Mini 3.8B | ~3.3 GB VRAM |
| 2 | Meta Llama 3.1 8B | ~6 GB VRAM |
| 3 | Alibaba Qwen-3 8B | ~5.6 GB VRAM |
| 4 | Microsoft Phi-4 14.7B | ~11 GB VRAM |
| 5 | Open AI GPT OSS 20B | ~16 GB VRAM |
| 6 | Z.ai GLM 4.7 Flash 30B | ~18 GB VRAM |
| other | Custom model URI | depends on model |
Additional model loaded automatically:
- Embedding model:
embeddinggemma-300mfor semantic memory retrieval.
Total VRAM usage is chat model + embedding model (~300 MB for embeddinggemma-300m).
Commands & Flow
Interactive Commands
| Command | Description |
|---|---|
/new |
Start a new chat session (clears conversation history, keeps memories). |
/remember <info> |
Explicitly store information in memory. |
/memories |
List all stored memory sources and their sections. |
/capacity [n] |
Show or set maximum memory entries (0 for unlimited). |
/decay [days] |
Show or set time-decay half-life in days (0 to disable). |
/clear |
Clear all memories from the current session. |
/save |
Save memories to disk (./agent_memory.bin). |
/load |
Load memories from disk. |
/consolidate [threshold] |
Merge similar memories into consolidated entries. Optional similarity threshold (default: 0.85). |
/summarize |
Summarize the current conversation and store it as a memory. |
/help |
Show all available commands. |
quit |
Exit the application (auto-saves memories). |
Conversation Flow
- Startup: Model selection and loading (chat + embedding).
- Memory Load: Attempts to load existing memories from disk.
- Chat Loop:
- User enters message or command.
- Commands are processed immediately.
- Messages trigger memory retrieval and response generation.
- New facts are automatically extracted by the LLM and stored (with deduplication).
- Exit: Memories auto-save on quit.
Memory Types Explained
| Type | Purpose | Examples |
|---|---|---|
| Semantic | Facts and knowledge | "Alex works at TechCorp", "User prefers TypeScript" |
| Episodic | Events and experiences | "Discussed API design on Monday", "Project deadline next week" |
| Procedural | Preferences and processes | "Always format code in C#", "Prefers detailed explanations" |
Memory types are automatically classified by the LLM during extraction. Each extracted fact includes a type, importance level, and category.
Example Conversations
Building Memory Over Time
User: My name is Alex and I work as a software engineer at TechCorp.
[Memory extracted: The user's name is Alex (Semantic, High)]
[Memory extracted: The user works as a software engineer at TechCorp (Semantic, Medium)]
Assistant: Nice to meet you, Alex! I'll remember that you're a software
engineer at TechCorp. What kind of projects do you work on?
User: I mainly work on backend services using C# and .NET.
[Memory extracted: The user works on backend services using C# and .NET (Semantic, Medium)]
Assistant: Got it! So you focus on backend development with C# and .NET
at TechCorp. That's a solid tech stack.
User: /new
Started a new chat session. Conversation history cleared, memories preserved.
User: What do you know about me?
Assistant: Based on what I recall, you're Alex, a software engineer
at TechCorp who specializes in backend services using C# and .NET.
Using Commands
User: /remember I prefer dark mode in all applications
Stored in memory: "I prefer dark mode in all applications"
User: /memories
Memory sources (2):
[Semantic] semantic_memories - 3 section(s)
[Semantic] user_memories - 1 section(s)
User: /save
Saved 2 memory sources to ./agent_memory.bin
Agent Configuration
using LMKit.Agents;
using LMKit.Agents.Memory;
using LMKit.Model;
// Load models
LM chatModel = new LM(new Uri(chatModelUri));
LM embeddingModel = new LM(new Uri(embeddingModelUri));
// Create or load memory
var memory = File.Exists(MEMORY_FILE_PATH)
? AgentMemory.Deserialize(MEMORY_FILE_PATH, embeddingModel)
: new AgentMemory(embeddingModel);
// Enable automatic LLM-based memory extraction
memory.ExtractionMode = MemoryExtractionMode.LlmBased;
memory.RunExtractionSynchronously = true;
// Optional: inspect what gets extracted
memory.BeforeMemoryStored += (sender, args) =>
{
foreach (var mem in args.Memories)
Console.WriteLine($"Extracted: {mem.Text} ({mem.MemoryType})");
};
// Build agent with memory
var agent = Agent.CreateBuilder(chatModel)
.WithPersona("You are a helpful, conversational personal assistant.")
.WithPlanning(PlanningStrategy.None)
.WithMemory(memory)
.Build();
// Execute conversation
var executor = new AgentExecutor();
var result = executor.Execute(agent, userInput);
// Save to disk
memory.Serialize(MEMORY_FILE_PATH);
🏗️ Architecture
+--------------------------------------------------+
| User Message |
+-------------------------+------------------------+
|
v
+--------------------------------------------------+
| Memory Retrieval |
| (Semantic search for relevant memories) |
+-------------------------+------------------------+
|
v
+--------------------------------------------------+
| Context Enhancement |
| (Inject relevant memories into prompt) |
+-------------------------+------------------------+
|
v
+--------------------------------------------------+
| Agent Response |
+-------------------------+------------------------+
|
v
+--------------------------------------------------+
| Automatic Memory Extraction |
| (LLM-based structured extraction + dedup) |
+--------------------------------------------------+
Memory Storage Location
Memories are saved to: ./agent_memory.bin
The binary format includes:
- All data sources with their sections
- Embedded vectors for semantic search
- Memory type classifications
Behavior & Policies
- Model loading: requires both chat model and embedding model.
- Memory retrieval: automatic RAG-based context enhancement per query.
- Fact extraction: LLM-based with grammar-constrained structured output.
- Deduplication: extracted memories are checked against existing memory via vector similarity.
- Auto-save: memories automatically saved on graceful exit.
- Memory isolation: each session can have independent memory instances.
- Licensing: set an optional license key via
LicenseManager.SetLicenseKey("").
⚙️ Getting Started
Prerequisites
- .NET 8.0 or later
- Sufficient VRAM for chat model + embedding model (~4-19 GB total)
Download
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/agents/persistent_memory_assistant
Run
dotnet build
dotnet run
Then:
- Select a model by typing 0-6, or paste a custom model URI.
- Wait for models to download (first run) and load.
- Chat naturally and share information about yourself.
- Use commands to manage memories explicitly.
- Memories auto-load on restart.
- Type
quitto exit (memories auto-save).
🔧 Troubleshooting
"No memories stored yet"
- Share some information in conversation first.
- Use
/remember <info>to store explicitly.
Memory not persisting
- Ensure you exit with
quitfor auto-save. - Use
/saveto save manually. - Check write permissions for
./agent_memory.bin.
- Ensure you exit with
Slow memory retrieval
- Embedding model needs to be loaded (adds ~300 MB VRAM).
- First query may be slower as embeddings are computed.
Out-of-memory errors
- Total VRAM = chat model + embedding model.
- Pick a smaller chat model if needed.
Assistant doesn't remember
- Check
/memoriesto see what's stored. - Use
/rememberto store explicitly. - Set
RunExtractionSynchronously = trueto ensure extraction completes before the next turn.
- Check
🚀 Extend the Demo
- Custom extraction model: set
ExtractionModelto a lightweight model for faster extraction. - Custom guidance: set
ExtractionPromptto focus extraction on specific domains. - Memory filtering: use
BeforeMemoryStoredto apply custom business rules. - Deduplication tuning: adjust
DeduplicationThresholdfor stricter or looser duplicate detection. - Cloud persistence: store memories in databases or cloud storage.
- Memory sharing: transfer memories between agents or sessions.