Table of Contents

Field EnableKVCacheQuantization

Namespace
LMKit.Global
Assembly
LM-Kit.NET.dll

Gets or sets a value indicating whether the conversation key-value (KV) cache is stored in a quantized form instead of full precision.

public static bool EnableKVCacheQuantization

Returns

bool

true to quantize the KV cache; otherwise, false. Default is false.

Remarks

When enabled, eligible contexts hold their KV cache at 8-bit precision (Q8_0) rather than 16-bit, roughly halving the cache's memory footprint so longer contexts fit on the same device, at a small generation-quality cost. The setting is read when a context is created, so it affects newly created contexts only; contexts already in memory keep the precision they were built with. Quantization is skipped for models that do not support it.

Share