Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/local-inference/context-hibernation/idle_session_hibernation

Context Hibernation for C# .NET Applications


🎯 Purpose of the Demo

An interactive console app that demonstrates IKVCache.HibernateAsync(): serialize a populated MultiTurnConversation KV-cache to disk, free its RAM/VRAM, then auto-rehydrate on the next Submit(). The demo prints Residency and Process.WorkingSet64 so the memory drop is visible.

All inference runs on-device.


👥 Industry Target Audience

  • Multi-tenant servers running many concurrent chat sessions.
  • Mobile / desktop apps that need to survive long background periods.
  • Agent workflows where context grows to hundreds of MB per session.
  • Workstation / IDE assistants: free GPU between user interactions.

🚀 Problem Solved

A KV-cache is the in-flight memory the runtime keeps for an inference session. For a busy multi-turn chat, it grows steadily. Without hibernation, the only way to free that memory is to discard the conversation. With hibernation, you keep the conversation warm on disk and pay the rehydration cost only on the rare re-access.


💻 Application Overview

Interactive menu (no command-line arguments) with five modes:

Mode What it does
Start Create a fresh MultiTurnConversation.
Ask REPL of free-form turns. Each turn reports residency and working set.
Hibernate Hibernate the current context to disk; report file size and memory drop.
Scripted Three-turn + hibernate + rehydrate canonical demo.
State Print residency, working set, and a KV-cache preview.
Quit Exit.

The model loads once at startup. Configuration.ContextHibernationDirectory is set to %TEMP%/lmkit-hibernation-demo for inspection.

✨ Key Features

  • LMKit.Inference.IKVCache with Residency, HibernateAsync(filePath = null), KVCacheContent.
  • ContextResidency enum: NotCreated, InMemory, Hibernated.
  • Configuration.ContextHibernationDirectory global setting.
  • Auto-rehydration on next inference call. No caller code change required.
  • Coalesced requests: concurrent hibernate calls share one task.

🧠 Model

  • qwen3.5:0.8b (smallest current Qwen 3.5, ~600 MB, fast to load).

🛠️ Getting Started

📋 Prerequisites

  • .NET 8.0 or later
  • Enough free disk space for hibernation files (can be hundreds of MB on larger models with long histories).

▶️ Running the Application

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/local-inference/context-hibernation/idle_session_hibernation
dotnet run

Pick a mode from the menu.

🚀 Extend the Demo

  • Wire HibernateAsync() into your app's "user idle" event so RAM is freed automatically after, say, two minutes of inactivity.
  • Persist the hibernation path in a database keyed by userId so each user has a dedicated warm conversation across server restarts.
  • Combine with Multi-GPU + Tensor Overrides to free RAM and VRAM for very large models.
Share