Table of Contents

Enum KVCacheType

Namespace
LMKit.Inference
Assembly
LM-Kit.NET.dll

Data type used to store a context's KV-cache, i.e. its quantization level. Lower-precision types shrink the per-token cache footprint at some cost to numerical accuracy. F16 is the default and is effectively unquantized for caching purposes.

public enum KVCacheType

Fields

F32 = 0

Full 32-bit floating point. Highest precision, largest footprint.

F16 = 1

16-bit floating point. The default; treated as the unquantized baseline.

BF16 = 2

16-bit brain floating point (bfloat16). Same size as F16 with a wider exponent range.

Q8_0 = 3

8-bit quantization. About half the footprint of F16 with minor precision loss.

Q4_0 = 4

4-bit quantization. Smallest common footprint, larger precision loss.

Q4_1 = 5

4-bit quantization with a per-block scale and offset, slightly more accurate than Q4_0.

IQ4_NL = 6

4-bit non-linear quantization. 4-bit footprint with improved accuracy over Q4_0.

Q5_0 = 7

5-bit quantization. Between Q4_0 and Q8_0 in size and precision.

Q5_1 = 8

5-bit quantization with a per-block scale and offset, slightly more accurate than Q5_0.

Share