Table of Contents

What Model Formats Does LM-Kit.NET Support?


TL;DR

LM-Kit.NET primarily uses GGUF (GPT-Generated Unified Format) for LLM, VLM, and speech models. It also uses ONNX for specialized tasks like certain embedding computations. The built-in model catalog distributes models as .lmk files, which are GGUF files with LM-Kit metadata. Any standard GGUF file from HuggingFace or other sources works directly.


Format Overview

Format Used For Extension Source
GGUF LLM inference, chat, agents, vision, speech-to-text, embeddings .gguf HuggingFace, llama.cpp ecosystem
LMK Same as GGUF (catalog distribution format) .lmk LM-Kit.NET model catalog
ONNX Specialized inference tasks (used internally) .onnx Bundled with LM-Kit.NET runtime

GGUF: The Primary Format

GGUF is the standard format for quantized language models in the llama.cpp ecosystem. It is a single-file format that contains the model weights, tokenizer, and metadata in one portable file. LM-Kit.NET uses llama.cpp as its inference backend, so it supports GGUF natively.

Where to find GGUF models:

  • HuggingFace has thousands of GGUF models
  • The LM-Kit.NET Model Catalog provides curated, tested models
  • Community quantizers like TheBloke, bartowski, and others publish GGUF conversions

Standard quantization levels in GGUF:

Quantization Description
Q4_K_M 4-bit with medium K-quant. Default for all LM-Kit catalog models. Best balance of size, speed, and quality.
Q5_K_M 5-bit. Slightly better quality, ~25% larger files.
Q6_K 6-bit. Near-original quality.
Q8_0 8-bit. Highest practical quality. Double the file size of Q4.
Q2_K / Q3_K 2 to 3-bit. Maximum compression, lower quality. For very constrained devices.
F16 / F32 Full precision. Used for fine-tuning inputs, not typical inference.

LMK: The Catalog Format

Models in the LM-Kit.NET catalog use the .lmk extension. These are GGUF files with additional LM-Kit metadata baked in (capability flags, recommended settings, family information). They are fully compatible with any GGUF tooling.

When you call LM.LoadFromModelID("qwen3.5:9b"), the SDK downloads the corresponding .lmk file from HuggingFace and caches it locally.


ONNX: Internal Runtime

LM-Kit.NET includes the ONNX Runtime (~12 to 30 MB depending on platform) for specialized inference tasks. This is used internally by certain processing pipelines. You do not need to manage ONNX models yourself; they are bundled with the SDK's native binaries.


Formats NOT Supported

LM-Kit.NET does not directly load these formats:

Format Alternative
safetensors Convert to GGUF using llama.cpp's convert_hf_to_gguf.py tool
PyTorch (.bin, .pt) Convert to GGUF using llama.cpp conversion tools
GPTQ Download or convert to GGUF quantization instead
AWQ Download or convert to GGUF quantization instead
MLX Apple-specific format. Use GGUF with Metal backend instead.

The llama.cpp project provides conversion tools to convert most popular formats to GGUF.


Share