👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/local-inference/context-hibernation/idle_session_hibernation

Context Hibernation for C# .NET Applications

🎯 Purpose of the Demo

An interactive console app that demonstrates IKVCache.HibernateAsync(): serialize a populated MultiTurnConversation KV-cache to disk, free its RAM/VRAM, then auto-rehydrate on the next Submit(). The demo prints Residency and Process.WorkingSet64 so the memory drop is visible.

All inference runs on-device.

👥 Industry Target Audience

Multi-tenant servers running many concurrent chat sessions.
Mobile / desktop apps that need to survive long background periods.
Agent workflows where context grows to hundreds of MB per session.
Workstation / IDE assistants: free GPU between user interactions.

🚀 Problem Solved

A KV-cache is the in-flight memory the runtime keeps for an inference session. For a busy multi-turn chat, it grows steadily. Without hibernation, the only way to free that memory is to discard the conversation. With hibernation, you keep the conversation warm on disk and pay the rehydration cost only on the rare re-access.

💻 Application Overview

Interactive menu (no command-line arguments) with five modes:

Mode	What it does
Start	Create a fresh `MultiTurnConversation`.
Ask	REPL of free-form turns. Each turn reports residency and working set.
Hibernate	Hibernate the current context to disk; report file size and memory drop.
Scripted	Three-turn + hibernate + rehydrate canonical demo.
State	Print residency, working set, and a KV-cache preview.
Quit	Exit.

The model loads once at startup. Configuration.ContextHibernationDirectory is set to %TEMP%/lmkit-hibernation-demo for inspection.

✨ Key Features

LMKit.Inference.IKVCache with Residency, HibernateAsync(filePath = null), KVCacheContent.
ContextResidency enum: NotCreated, InMemory, Hibernated.
Configuration.ContextHibernationDirectory global setting.
Auto-rehydration on next inference call. No caller code change required.
Coalesced requests: concurrent hibernate calls share one task.

🧠 Model

qwen3.5:0.8b (smallest current Qwen 3.5, ~600 MB, fast to load).

🛠️ Getting Started

📋 Prerequisites

.NET 8.0 or later
Enough free disk space for hibernation files (can be hundreds of MB on larger models with long histories).

▶️ Running the Application

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/local-inference/context-hibernation/idle_session_hibernation
dotnet run

Pick a mode from the menu.

🚀 Extend the Demo

Wire HibernateAsync() into your app's "user idle" event so RAM is freed automatically after, say, two minutes of inactivity.
Persist the hibernation path in a database keyed by userId so each user has a dedicated warm conversation across server restarts.
Combine with Multi-GPU + Tensor Overrides to free RAM and VRAM for very large models.

Table of Contents