Can I Use My Own GGUF Model Files with LM-Kit.NET?
TL;DR
Yes. LM-Kit.NET loads any GGUF-format model file, whether from the built-in catalog, downloaded from HuggingFace, or converted from other sources. You can load from a local file path, an HTTPS URL (auto-downloaded and cached), or the model catalog by ID. The model must use a supported architecture (Llama, Qwen, Gemma, Mistral, Phi, and many others).
Three Ways to Load a Model
1. From a local file
using LMKit.Model;
// Direct file path
using LM model = new LM("C:/models/my-custom-model-Q4_K_M.gguf");
// Or using a file:// URI
using LM model = new LM(new Uri("file:///C:/models/my-custom-model-Q4_K_M.gguf"));
2. From a HuggingFace URL (auto-download)
using LMKit.Model;
// First run: downloads and caches locally
// Subsequent runs: loads from cache
using LM model = new LM(new Uri(
"https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf"
));
3. From the built-in catalog
using LMKit.Model;
// By model ID (catalog handles the URI for you)
using LM model = LM.LoadFromModelID("qwen3.5:9b");
Supported Architectures
LM-Kit.NET supports a wide range of model architectures through its llama.cpp backend. If a GGUF file uses one of these architectures, it will load:
| Category | Architectures |
|---|---|
| General text | Llama, Qwen 2/3/3.5, Gemma 2/3, Mistral 3, Phi 3, GPT-OSS, DeepSeek 2, Falcon H1, GLM 4, Granite, SmolLM 3, Nemotron |
| Mixture of Experts | Qwen 3.5 MoE, Nemotron H MoE |
| Vision | Qwen 2 VL, Qwen 3.5, MiniCPM-V, Pixtral, PaddleOCR |
| Embeddings | BERT, Nomic-BERT, Gemma Embedding |
| Speech | Whisper |
This list grows with each release as new architectures are added to the underlying inference engine.
Quantization Formats
LM-Kit.NET supports all GGUF quantization formats. The most common ones you will encounter:
| Format | Bits | Quality | Speed | Typical Use |
|---|---|---|---|---|
| Q4_K_M | 4-bit | Good | Fast | Default for all catalog models. Best general-purpose choice. |
| Q5_K_M | 5-bit | Better | Slightly slower | When you need slightly higher quality. |
| Q6_K | 6-bit | High | Slower | Near-original quality with larger file size. |
| Q8_0 | 8-bit | Very high | Slower, larger | Maximum quality at 2x the size of Q4. |
| Q2_K | 2-bit | Lower | Fastest | Extreme compression for very constrained devices. |
| F16 | 16-bit | Original | Slowest, largest | Full precision. Rarely needed for inference. |
Using Models from Ollama or Other Tools
If you have GGUF files from Ollama, llama.cpp, GPT4All, or other tools, they work directly with LM-Kit.NET. Just point to the file:
// Ollama stores models as GGUF files in its cache directory
using LM model = new LM("/home/user/.ollama/models/blobs/sha256-abc123");
// Or any GGUF file you downloaded from HuggingFace manually
using LM model = new LM("/home/user/downloads/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf");
Verifying Model Capabilities After Loading
After loading a custom model, you can inspect its properties to confirm it supports the features you need:
using LMKit.Model;
using LM model = new LM("path/to/custom-model.gguf");
Console.WriteLine($"Name: {model.Name}");
Console.WriteLine($"Parameters: {model.Parameters}");
Console.WriteLine($"Context length: {model.ContextLength}");
Console.WriteLine($"Chat: {model.Capabilities.HasFlag(ModelCapabilities.Chat)}");
Console.WriteLine($"Tool calling: {model.Capabilities.HasFlag(ModelCapabilities.ToolsCall)}");
Console.WriteLine($"Vision: {model.Capabilities.HasFlag(ModelCapabilities.Vision)}");
📚 Related Content
- What model formats does LM-Kit.NET support?: Detailed explanation of GGUF, ONNX, and the LMK format used by catalog models.
- How do I choose the right model size for my hardware?: Match your custom model's file size to available memory.
- Model Catalog: Browse all pre-tested models with verified capabilities and download URIs.
- Understanding Model Loading and Caching: How the download cache works and where model files are stored.