Interface IKVCache
Contract for objects that own an inference KV-cache whose residency in memory can be observed and, optionally, hibernated to disk.
public interface IKVCache
Remarks
The KV-cache is the runtime memory an LLM uses to hold the intermediate key / value tensors produced while processing tokens. It grows with the prompt / conversation and, for long-running sessions, can occupy significant RAM or VRAM. This interface lets callers:
- Inspect the text currently represented in the cache via KVCacheContent.
- Observe where the underlying context currently lives via Residency.
- Offload the cache and its owning context to disk via HibernateAsync(string). The cache is rehydrated transparently on the next inference call.
Implementations include MultiTurnConversation. Cast an instance
to IKVCache to access this functionality:
var chat = new MultiTurnConversation(model);
// ... use the chat ...
if (chat is IKVCache cache && cache.Residency == ContextResidency.InMemory)
{
// Free RAM; state is restored automatically on next use.
await cache.HibernateAsync();
}
Properties
- KVCacheContent
Gets the textual content currently represented in the KV-cache, reconstructed from the tokens it holds. Returns an empty string when the cache is empty or when no message has yet been processed.
- Residency
Gets the current residency of the underlying inference context: whether it has not yet been created, is live in RAM, or has been hibernated to disk. See ContextResidency for details.
Methods
- HibernateAsync(string)
Schedules hibernation of the inference context on a background thread and returns a task representing its completion.
- Warmup()
Eagerly ensures the underlying inference context is initialized and resident in memory.