What Model Formats Does LM-Kit.NET Support?
TL;DR
LM-Kit.NET primarily uses GGUF (GPT-Generated Unified Format) for LLM, VLM, and speech models. It also uses ONNX for specialized tasks like certain embedding computations. The built-in model catalog distributes models as .lmk files, which are GGUF files with LM-Kit metadata. Any standard GGUF file from HuggingFace or other sources works directly.
Format Overview
| Format | Used For | Extension | Source |
|---|---|---|---|
| GGUF | LLM inference, chat, agents, vision, speech-to-text, embeddings | .gguf |
HuggingFace, llama.cpp ecosystem |
| LMK | Same as GGUF (catalog distribution format) | .lmk |
LM-Kit.NET model catalog |
| ONNX | Specialized inference tasks (used internally) | .onnx |
Bundled with LM-Kit.NET runtime |
GGUF: The Primary Format
GGUF is the standard format for quantized language models in the llama.cpp ecosystem. It is a single-file format that contains the model weights, tokenizer, and metadata in one portable file. LM-Kit.NET uses llama.cpp as its inference backend, so it supports GGUF natively.
Where to find GGUF models:
- HuggingFace has thousands of GGUF models
- The LM-Kit.NET Model Catalog provides curated, tested models
- Community quantizers like TheBloke, bartowski, and others publish GGUF conversions
Standard quantization levels in GGUF:
| Quantization | Description |
|---|---|
| Q4_K_M | 4-bit with medium K-quant. Default for all LM-Kit catalog models. Best balance of size, speed, and quality. |
| Q5_K_M | 5-bit. Slightly better quality, ~25% larger files. |
| Q6_K | 6-bit. Near-original quality. |
| Q8_0 | 8-bit. Highest practical quality. Double the file size of Q4. |
| Q2_K / Q3_K | 2 to 3-bit. Maximum compression, lower quality. For very constrained devices. |
| F16 / F32 | Full precision. Used for fine-tuning inputs, not typical inference. |
LMK: The Catalog Format
Models in the LM-Kit.NET catalog use the .lmk extension. These are GGUF files with additional LM-Kit metadata baked in (capability flags, recommended settings, family information). They are fully compatible with any GGUF tooling.
When you call LM.LoadFromModelID("qwen3.5:9b"), the SDK downloads the corresponding .lmk file from HuggingFace and caches it locally.
ONNX: Internal Runtime
LM-Kit.NET includes the ONNX Runtime (~12 to 30 MB depending on platform) for specialized inference tasks. This is used internally by certain processing pipelines. You do not need to manage ONNX models yourself; they are bundled with the SDK's native binaries.
Formats NOT Supported
LM-Kit.NET does not directly load these formats:
| Format | Alternative |
|---|---|
| safetensors | Convert to GGUF using llama.cpp's convert_hf_to_gguf.py tool |
| PyTorch (.bin, .pt) | Convert to GGUF using llama.cpp conversion tools |
| GPTQ | Download or convert to GGUF quantization instead |
| AWQ | Download or convert to GGUF quantization instead |
| MLX | Apple-specific format. Use GGUF with Metal backend instead. |
The llama.cpp project provides conversion tools to convert most popular formats to GGUF.
📚 Related Content
- Can I use my own GGUF model files with LM-Kit.NET?: Step-by-step guide to loading custom GGUF models from local files or URLs.
- Model Catalog: Browse all curated LMK/GGUF models with capabilities, sizes, and download links.
- How do I choose the right model size for my hardware?: Compare quantization levels and their impact on memory and quality.
- Glossary: Quantization: In-depth explanation of model quantization techniques and trade-offs.