Interface IKVCache

Namespace: LMKit.Inference

Assembly: LM-Kit.NET.dll

Contract for objects that own an inference KV-cache whose residency in memory can be observed and, optionally, hibernated to disk.

public interface IKVCache

Remarks

The KV-cache is the runtime memory an LLM uses to hold the intermediate key / value tensors produced while processing tokens. It grows with the prompt / conversation and, for long-running sessions, can occupy significant RAM or VRAM. This interface lets callers:

Inspect the text currently represented in the cache via KVCacheContent.
Observe where the underlying context currently lives via Residency.
Offload the cache and its owning context to disk via HibernateAsync(string). The cache is rehydrated transparently on the next inference call.

Implementations include MultiTurnConversation. Cast an instance to IKVCache to access this functionality:

var chat = new MultiTurnConversation(model);
// ... use the chat ...
if (chat is IKVCache cache && cache.Residency == ContextResidency.InMemory)
{
// Free RAM; state is restored automatically on next use.
await cache.HibernateAsync();
}

Properties

KVCacheContent: Gets the textual content currently represented in the KV-cache, reconstructed from the tokens it holds. Returns an empty string when the cache is empty or when no message has yet been processed.

Residency: Gets the current residency of the underlying inference context: whether it has not yet been created, is live in RAM, or has been hibernated to disk. See ContextResidency for details.

Methods

HibernateAsync(string): Schedules hibernation of the inference context on a background thread and returns a task representing its completion.

Warmup(): Eagerly ensures the underlying inference context is initialized and resident in memory.

Table of Contents

Interface IKVCache

Remarks

Properties

Methods