What Model Formats Does LM-Kit.NET Support?

TL;DR

LM-Kit.NET primarily uses GGUF (GPT-Generated Unified Format) for LLM, VLM, and speech models. It also uses ONNX for specialized tasks like certain embedding computations. The built-in model catalog distributes models as .lmk files, which are GGUF files with LM-Kit metadata. Any standard GGUF file from HuggingFace or other sources works directly.

Format Overview

Format	Used For	Extension	Source
GGUF	LLM inference, chat, agents, vision, speech-to-text, embeddings	`.gguf`	HuggingFace, llama.cpp ecosystem
LMK	Same as GGUF (catalog distribution format)	`.lmk`	LM-Kit.NET model catalog
ONNX	Specialized inference tasks (used internally)	`.onnx`	Bundled with LM-Kit.NET runtime

GGUF: The Primary Format

GGUF is the standard format for quantized language models in the llama.cpp ecosystem. It is a single-file format that contains the model weights, tokenizer, and metadata in one portable file. LM-Kit.NET uses llama.cpp as its inference backend, so it supports GGUF natively.

Where to find GGUF models:

HuggingFace has thousands of GGUF models
The LM-Kit.NET Model Catalog provides curated, tested models
Community quantizers like TheBloke, bartowski, and others publish GGUF conversions

Standard quantization levels in GGUF:

Quantization	Description
Q4_K_M	4-bit with medium K-quant. Default for all LM-Kit catalog models. Best balance of size, speed, and quality.
Q5_K_M	5-bit. Slightly better quality, ~25% larger files.
Q6_K	6-bit. Near-original quality.
Q8_0	8-bit. Highest practical quality. Double the file size of Q4.
Q2_K / Q3_K	2 to 3-bit. Maximum compression, lower quality. For very constrained devices.
F16 / F32	Full precision. Used for fine-tuning inputs, not typical inference.

LMK: The Catalog Format

Models in the LM-Kit.NET catalog use the .lmk extension. These are GGUF files with additional LM-Kit metadata baked in (capability flags, recommended settings, family information). They are fully compatible with any GGUF tooling.

When you call LM.LoadFromModelID("qwen3.5:9b"), the SDK downloads the corresponding .lmk file from HuggingFace and caches it locally.

ONNX: Internal Runtime

LM-Kit.NET includes the ONNX Runtime (~12 to 30 MB depending on platform) for specialized inference tasks. This is used internally by certain processing pipelines. You do not need to manage ONNX models yourself; they are bundled with the SDK's native binaries.

Formats NOT Supported

LM-Kit.NET does not directly load these formats:

Format	Alternative
safetensors	Convert to GGUF using llama.cpp's `convert_hf_to_gguf.py` tool
PyTorch (.bin, .pt)	Convert to GGUF using llama.cpp conversion tools
GPTQ	Download or convert to GGUF quantization instead
AWQ	Download or convert to GGUF quantization instead
MLX	Apple-specific format. Use GGUF with Metal backend instead.

The llama.cpp project provides conversion tools to convert most popular formats to GGUF.

Can I use my own GGUF model files with LM-Kit.NET?: Step-by-step guide to loading custom GGUF models from local files or URLs.
Model Catalog: Browse all curated LMK/GGUF models with capabilities, sizes, and download links.
How do I choose the right model size for my hardware?: Compare quantization levels and their impact on memory and quality.
Glossary: Quantization: In-depth explanation of model quantization techniques and trade-offs.

Table of Contents