👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/local-inference/context-hibernation/idle_session_hibernation
Context Hibernation for C# .NET Applications
🎯 Purpose of the Demo
An interactive console app that demonstrates IKVCache.HibernateAsync(): serialize a populated MultiTurnConversation KV-cache to disk, free its RAM/VRAM, then auto-rehydrate on the next Submit(). The demo prints Residency and Process.WorkingSet64 so the memory drop is visible.
All inference runs on-device.
👥 Industry Target Audience
- Multi-tenant servers running many concurrent chat sessions.
- Mobile / desktop apps that need to survive long background periods.
- Agent workflows where context grows to hundreds of MB per session.
- Workstation / IDE assistants: free GPU between user interactions.
🚀 Problem Solved
A KV-cache is the in-flight memory the runtime keeps for an inference session. For a busy multi-turn chat, it grows steadily. Without hibernation, the only way to free that memory is to discard the conversation. With hibernation, you keep the conversation warm on disk and pay the rehydration cost only on the rare re-access.
💻 Application Overview
Interactive menu (no command-line arguments) with five modes:
| Mode | What it does |
|---|---|
| Start | Create a fresh MultiTurnConversation. |
| Ask | REPL of free-form turns. Each turn reports residency and working set. |
| Hibernate | Hibernate the current context to disk; report file size and memory drop. |
| Scripted | Three-turn + hibernate + rehydrate canonical demo. |
| State | Print residency, working set, and a KV-cache preview. |
| Quit | Exit. |
The model loads once at startup. Configuration.ContextHibernationDirectory is set to %TEMP%/lmkit-hibernation-demo for inspection.
✨ Key Features
LMKit.Inference.IKVCachewithResidency,HibernateAsync(filePath = null),KVCacheContent.ContextResidencyenum:NotCreated,InMemory,Hibernated.Configuration.ContextHibernationDirectoryglobal setting.- Auto-rehydration on next inference call. No caller code change required.
- Coalesced requests: concurrent hibernate calls share one task.
🧠 Model
qwen3.5:0.8b(smallest current Qwen 3.5, ~600 MB, fast to load).
🛠️ Getting Started
📋 Prerequisites
- .NET 8.0 or later
- Enough free disk space for hibernation files (can be hundreds of MB on larger models with long histories).
▶️ Running the Application
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/local-inference/context-hibernation/idle_session_hibernation
dotnet run
Pick a mode from the menu.
🚀 Extend the Demo
- Wire
HibernateAsync()into your app's "user idle" event so RAM is freed automatically after, say, two minutes of inactivity. - Persist the hibernation path in a database keyed by
userIdso each user has a dedicated warm conversation across server restarts. - Combine with Multi-GPU + Tensor Overrides to free RAM and VRAM for very large models.