Enum LLM.Precision
Represents the different precision types for LLM models.
public enum LLM.Precision
Fields
ALL_F32 = 0
Full precision using 32-bit floating point (FP32).
MOSTLY_F16 = 1
Mixed precision using mostly 16-bit floating point (FP16), except for 1D tensors.
MOSTLY_Q4_0 = 2
Quantized precision using mostly 4-bit integers (Q4_0), except for 1D tensors.
MOSTLY_Q4_1 = 3
Quantized precision using mostly 4-bit integers (Q4_1), except for 1D tensors.
MOSTLY_Q8_0 = 7
Quantized precision using mostly 8-bit integers (Q8_0), except for 1D tensors.
MOSTLY_Q5_0 = 8
Quantized precision using mostly 5-bit integers (Q5_0), except for 1D tensors.
MOSTLY_Q5_1 = 9
Quantized precision using mostly 5-bit integers (Q5_1), except for 1D tensors.
MOSTLY_Q2_K = 10
Quantized precision using mostly 2-bit integers with K-means clustering (Q2_K), except for 1D tensors.
MOSTLY_Q3_K_S = 11
Quantized precision using mostly 3-bit integers with K-means clustering, small size (Q3_K_S), except for 1D tensors.
MOSTLY_Q3_K_M = 12
Quantized precision using mostly 3-bit integers with K-means clustering, medium size (Q3_K_M), except for 1D tensors.
MOSTLY_Q3_K_L = 13
Quantized precision using mostly 3-bit integers with K-means clustering, large size (Q3_K_L), except for 1D tensors.
MOSTLY_Q4_K_S = 14
Quantized precision using mostly 4-bit integers with K-means clustering, small size (Q4_K_S), except for 1D tensors.
MOSTLY_Q4_K_M = 15
Quantized precision using mostly 4-bit integers with K-means clustering, medium size (Q4_K_M), except for 1D tensors.
MOSTLY_Q5_K_S = 16
Quantized precision using mostly 5-bit integers with K-means clustering, small size (Q5_K_S), except for 1D tensors.
MOSTLY_Q5_K_M = 17
Quantized precision using mostly 5-bit integers with K-means clustering, medium size (Q5_K_M), except for 1D tensors.
MOSTLY_Q6_K = 18
Quantized precision using mostly 6-bit integers with K-means clustering (Q6_K), except for 1D tensors.
MOSTLY_IQ2_XXS = 19
Quantized precision using mostly 2-bit integers, extra extra small size (IQ2_XXS), except for 1D tensors.
MOSTLY_IQ2_XS = 20
Quantized precision using mostly 2-bit integers, extra small size (IQ2_XS), except for 1D tensors.
MOSTLY_Q2_K_S = 21
Quantized precision using mostly 2-bit integers with K-means clustering, small size (Q2_K_S), except for 1D tensors.
MOSTLY_IQ3_XS = 22
Quantized precision using mostly 3-bit integers, extra small size (IQ3_XS), except for 1D tensors.
MOSTLY_IQ3_XXS = 23
Quantized precision using mostly 3-bit integers, extra extra small size (IQ3_XXS), except for 1D tensors.
MOSTLY_IQ1_S = 24
Quantized precision using mostly 1-bit integers, small size (IQ1_S), except for 1D tensors.
MOSTLY_IQ4_NL = 25
Quantized precision using mostly 4-bit integers, no-loss (IQ4_NL), except for 1D tensors.
MOSTLY_IQ3_S = 26
Quantized precision using mostly 3-bit integers, small size (IQ3_S), except for 1D tensors.
MOSTLY_IQ3_M = 27
Quantized precision using mostly 3-bit integers, medium size (IQ3_M), except for 1D tensors.
MOSTLY_IQ2_S = 28
Quantized precision using mostly 2-bit integers, small size (IQ2_S), except for 1D tensors.
MOSTLY_IQ2_M = 29
Quantized precision using mostly 2-bit integers, medium size (IQ2_M), except for 1D tensors.
MOSTLY_IQ4_XS = 30
Quantized precision using mostly 4-bit integers, extra small size (IQ4_XS), except for 1D tensors.
MOSTLY_IQ1_M = 31
Quantized precision using mostly 1-bit integers, medium size (IQ1_M), except for 1D tensors.
MOSTLY_BF16 = 32
Quantized precision using mostly 16-bit brain floating point (BF16), except for 1D tensors.
MOSTLY_Q4_0_4_4 = 33
Quantized precision using mostly 4-bit integers (Q4_0), with additional clustering (Q4_0_4_4).
MOSTLY_Q4_0_4_8 = 34
Quantized precision using mostly 4-bit integers (Q4_0), with additional clustering (Q4_0_4_8).
MOSTLY_Q4_0_8_8 = 35
Quantized precision using mostly 4-bit integers (Q4_0), with additional clustering (Q4_0_8_8).
MOSTLY_TQ1_0 = 36
MOSTLY_TQ2_0 = 37
GUESSED = 1024
Precision type is guessed because it is not specified in the model file.