Property EnableKVCacheQuantization

Namespace: LMKit.Global

Assembly: LM-Kit.NET.dll

EnableKVCacheQuantization

Gets or sets a value indicating whether the conversation key-value (KV) cache is stored in a quantized form instead of full precision.

public static bool EnableKVCacheQuantization { get; set; }

Property Value

bool: true to quantize the KV cache; otherwise, false. Default is true.

Remarks

When enabled, eligible contexts hold their KV cache at 8-bit precision (Q8_0) rather than 16-bit, roughly halving the cache's memory footprint so longer contexts fit on the same device, at a small generation-quality cost. The setting is read when a context is created, so it affects newly created contexts only; contexts already in memory keep the precision they were built with. Quantization is skipped for models that do not support it.

Table of Contents

Property EnableKVCacheQuantization

EnableKVCacheQuantization

Property Value

Remarks