Enum KVCacheType

Namespace: LMKit.Inference

Assembly: LM-Kit.NET.dll

Data type used to store a context's KV-cache, i.e. its quantization level. Lower-precision types shrink the per-token cache footprint at some cost to numerical accuracy. F16 is the default and is effectively unquantized for caching purposes.

public enum KVCacheType

Fields

F32 = 0: Full 32-bit floating point. Highest precision, largest footprint.
F16 = 1: 16-bit floating point. The default; treated as the unquantized baseline.
BF16 = 2: 16-bit brain floating point (bfloat16). Same size as F16 with a wider exponent range.
Q8_0 = 3: 8-bit quantization. About half the footprint of F16 with minor precision loss.
Q4_0 = 4: 4-bit quantization. Smallest common footprint, larger precision loss.
Q4_1 = 5: 4-bit quantization with a per-block scale and offset, slightly more accurate than Q4_0.
IQ4_NL = 6: 4-bit non-linear quantization. 4-bit footprint with improved accuracy over Q4_0.
Q5_0 = 7: 5-bit quantization. Between Q4_0 and Q8_0 in size and precision.
Q5_1 = 8: 5-bit quantization with a per-block scale and offset, slightly more accurate than Q5_0.

Table of Contents

Enum KVCacheType

Fields