Class Quantizer
- Namespace
- LMKit.Quantization
- Assembly
- LM-Kit.NET.dll
Provides functionality for quantizing LM models.
public sealed class Quantizer
- Inheritance
-
Quantizer
- Inherited Members
Examples
Example: Quantize a model to 4-bit precision
using LMKit.Quantization;
using LMKit.Model;
using System;
// Create quantizer with source model path
var quantizer = new Quantizer("models/llama-3.2-3b-f16.gguf");
// Set thread count for faster quantization
quantizer.ThreadCount = Environment.ProcessorCount;
// Quantize to 4-bit (Q4_K_M offers good balance of size and quality)
quantizer.Quantize(
dstFileName: "models/llama-3.2-3b-q4km.gguf",
modelPrecision: LM.Precision.MOSTLY_Q4_K_M);
Console.WriteLine("Quantization complete!");
// Load and use the quantized model
LM model = LM.LoadFromPath("models/llama-3.2-3b-q4km.gguf");
Console.WriteLine($"Model loaded with {model.Precision} precision");
Example: Quantize to different precision levels
using LMKit.Quantization;
using LMKit.Model;
using System;
string sourceModel = "models/large-model-f16.gguf";
// Create multiple quantized versions
var quantizer = new Quantizer(sourceModel);
// 8-bit for highest quality
quantizer.Quantize("models/large-model-q8.gguf", LM.Precision.MOSTLY_Q8_0);
// 5-bit for balanced size/quality
quantizer.Quantize("models/large-model-q5km.gguf", LM.Precision.MOSTLY_Q5_K_M);
// 4-bit for smallest size
quantizer.Quantize("models/large-model-q4km.gguf", LM.Precision.MOSTLY_Q4_K_M);
Console.WriteLine("All quantization variants created");
Remarks
Quantization reduces model size and memory requirements by converting weights from higher precision (e.g., FP32, FP16) to lower precision formats (e.g., 4-bit, 8-bit). This enables running larger models on hardware with limited resources while maintaining acceptable quality.
Supported Precision Modes
- Q4_0, Q4_1, Q4_K_S, Q4_K_M - 4-bit quantization (smallest size)
- Q5_0, Q5_1, Q5_K_S, Q5_K_M - 5-bit quantization (balanced)
- Q6_K - 6-bit quantization (good quality)
- Q8_0 - 8-bit quantization (high quality)
- F16, F32 - Full precision (highest quality)
Constructors
- Quantizer(string)
Initializes a new instance of the Quantizer class with the specified model path.
Properties
- ThreadCount
Gets or sets the number of threads to be used for processing.
Ensures that the thread count is always at least 1 to prevent invalid configurations.
Methods
- Quantize(string, Precision, bool, MetadataCollection)
Quantizes the model specified by Quantizer(string) and saves the quantized model to the specified destination file.