Table of Contents

Class Quantizer

Namespace
LMKit.Quantization
Assembly
LM-Kit.NET.dll

Provides functionality for quantizing LM models.

public sealed class Quantizer
Inheritance
Quantizer
Inherited Members

Examples

Example: Quantize a model to 4-bit precision

using LMKit.Quantization;
using LMKit.Model;
using System;

// Create quantizer with source model path var quantizer = new Quantizer("models/llama-3.2-3b-f16.gguf");

// Set thread count for faster quantization quantizer.ThreadCount = Environment.ProcessorCount;

// Quantize to 4-bit (Q4_K_M offers good balance of size and quality) quantizer.Quantize( dstFileName: "models/llama-3.2-3b-q4km.gguf", modelPrecision: LM.Precision.MOSTLY_Q4_K_M);

Console.WriteLine("Quantization complete!");

// Load and use the quantized model LM model = LM.LoadFromPath("models/llama-3.2-3b-q4km.gguf"); Console.WriteLine($"Model loaded with {model.Precision} precision");

Example: Quantize to different precision levels

using LMKit.Quantization;
using LMKit.Model;
using System;

string sourceModel = "models/large-model-f16.gguf";

// Create multiple quantized versions var quantizer = new Quantizer(sourceModel);

// 8-bit for highest quality quantizer.Quantize("models/large-model-q8.gguf", LM.Precision.MOSTLY_Q8_0);

// 5-bit for balanced size/quality quantizer.Quantize("models/large-model-q5km.gguf", LM.Precision.MOSTLY_Q5_K_M);

// 4-bit for smallest size quantizer.Quantize("models/large-model-q4km.gguf", LM.Precision.MOSTLY_Q4_K_M);

Console.WriteLine("All quantization variants created");

Remarks

Quantization reduces model size and memory requirements by converting weights from higher precision (e.g., FP32, FP16) to lower precision formats (e.g., 4-bit, 8-bit). This enables running larger models on hardware with limited resources while maintaining acceptable quality.

Supported Precision Modes

  • Q4_0, Q4_1, Q4_K_S, Q4_K_M - 4-bit quantization (smallest size)
  • Q5_0, Q5_1, Q5_K_S, Q5_K_M - 5-bit quantization (balanced)
  • Q6_K - 6-bit quantization (good quality)
  • Q8_0 - 8-bit quantization (high quality)
  • F16, F32 - Full precision (highest quality)

Constructors

Quantizer(string)

Initializes a new instance of the Quantizer class with the specified model path.

Properties

ThreadCount

Gets or sets the number of threads to be used for processing.
Ensures that the thread count is always at least 1 to prevent invalid configurations.

Methods

Quantize(string, Precision, bool, MetadataCollection)

Quantizes the model specified by Quantizer(string) and saves the quantized model to the specified destination file.

Share