Table of Contents

Class Quantizer

Namespace
LMKit.Quantization
Assembly
LM-Kit.NET.dll

Provides functionality for quantizing LM models.

public sealed class Quantizer
Inheritance
Quantizer
Inherited Members

Examples

Example: Quantize a model to 4-bit precision

using LMKit.Quantization;
using LMKit.Model;
using System;

// Create quantizer with source model path
var quantizer = new Quantizer("models/llama-3.2-3b-f16.gguf");

// Set thread count for faster quantization
quantizer.ThreadCount = Environment.ProcessorCount;

// Quantize to 4-bit (Q4_K_M offers good balance of size and quality)
quantizer.Quantize(
    dstFileName: "models/llama-3.2-3b-q4km.gguf",
    modelPrecision: LM.Precision.MOSTLY_Q4_K_M);

Console.WriteLine("Quantization complete!");

// Load and use the quantized model
LM model = LM.LoadFromPath("models/llama-3.2-3b-q4km.gguf");
Console.WriteLine($"Model loaded with {model.Precision} precision");

Example: Quantize to different precision levels

using LMKit.Quantization;
using LMKit.Model;
using System;

string sourceModel = "models/large-model-f16.gguf";

// Create multiple quantized versions
var quantizer = new Quantizer(sourceModel);

// 8-bit for highest quality
quantizer.Quantize("models/large-model-q8.gguf", LM.Precision.MOSTLY_Q8_0);

// 5-bit for balanced size/quality
quantizer.Quantize("models/large-model-q5km.gguf", LM.Precision.MOSTLY_Q5_K_M);

// 4-bit for smallest size
quantizer.Quantize("models/large-model-q4km.gguf", LM.Precision.MOSTLY_Q4_K_M);

Console.WriteLine("All quantization variants created");

Remarks

Quantization reduces model size and memory requirements by converting weights from higher precision (e.g., FP32, FP16) to lower precision formats (e.g., 4-bit, 8-bit). This enables running larger models on hardware with limited resources while maintaining acceptable quality.

Supported Precision Modes

  • Q4_0, Q4_1, Q4_K_S, Q4_K_M - 4-bit quantization (smallest size)
  • Q5_0, Q5_1, Q5_K_S, Q5_K_M - 5-bit quantization (balanced)
  • Q6_K - 6-bit quantization (good quality)
  • Q8_0 - 8-bit quantization (high quality)
  • F16, F32 - Full precision (highest quality)

Constructors

Quantizer(string)

Initializes a new instance of the Quantizer class with the specified model path.

Properties

ThreadCount

Gets or sets the number of threads to be used for processing.
Ensures that the thread count is always at least 1 to prevent invalid configurations.

Methods

Quantize(string, Precision, bool, MetadataCollection)

Quantizes the model specified by Quantizer(string) and saves the quantized model to the specified destination file.