Can I Use My Own GGUF Model Files with LM-Kit.NET?

TL;DR

Yes. LM-Kit.NET loads any GGUF-format model file, whether from the built-in catalog, downloaded from HuggingFace, or converted from other sources. You can load from a local file path, an HTTPS URL (auto-downloaded and cached), or the model catalog by ID. The model must use a supported architecture (Llama, Qwen, Gemma, Mistral, Phi, and many others).

Three Ways to Load a Model

1. From a local file

using LMKit.Model;

// Direct file path
using LM model = new LM("C:/models/my-custom-model-Q4_K_M.gguf");

// Or using a file:// URI
using LM model = new LM(new Uri("file:///C:/models/my-custom-model-Q4_K_M.gguf"));

2. From a HuggingFace URL (auto-download)

using LMKit.Model;

// First run: downloads and caches locally
// Subsequent runs: loads from cache
using LM model = new LM(new Uri(
    "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf"
));

3. From the built-in catalog

using LMKit.Model;

// By model ID (catalog handles the URI for you)
using LM model = LM.LoadFromModelID("qwen3.5:9b");

Supported Architectures

LM-Kit.NET supports a wide range of model architectures through its llama.cpp backend. If a GGUF file uses one of these architectures, it will load:

Category	Architectures
General text	Llama, Qwen 2/3/3.5, Gemma 2/3, Mistral 3, Phi 3, GPT-OSS, DeepSeek 2, Falcon H1, GLM 4, Granite, SmolLM 3, Nemotron
Mixture of Experts	Qwen 3.5 MoE, Nemotron H MoE
Vision	Qwen 2 VL, Qwen 3.5, MiniCPM-V, Pixtral, PaddleOCR
Embeddings	BERT, Nomic-BERT, Gemma Embedding, Qwen3-based embedders (Qwen3 Embedding, Harrier OSS)
Speech	Whisper

This list grows with each release as new architectures are added to the underlying inference engine.

Quantization Formats

LM-Kit.NET supports all GGUF quantization formats. The most common ones you will encounter:

Format	Bits	Quality	Speed	Typical Use
Q4_K_M	4-bit	Good	Fast	Default for all catalog models. Best general-purpose choice.
Q5_K_M	5-bit	Better	Slightly slower	When you need slightly higher quality.
Q6_K	6-bit	High	Slower	Near-original quality with larger file size.
Q8_0	8-bit	Very high	Slower, larger	Maximum quality at 2x the size of Q4.
Q2_K	2-bit	Lower	Fastest	Extreme compression for very constrained devices.
F16	16-bit	Original	Slowest, largest	Full precision. Rarely needed for inference.

Using Models from Ollama or Other Tools

If you have GGUF files from Ollama, llama.cpp, GPT4All, or other tools, they work directly with LM-Kit.NET. Just point to the file:

// Ollama stores models as GGUF files in its cache directory
using LM model = new LM("/home/user/.ollama/models/blobs/sha256-abc123");

// Or any GGUF file you downloaded from HuggingFace manually
using LM model = new LM("/home/user/downloads/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf");

Verifying Model Capabilities After Loading

After loading a custom model, you can inspect its properties to confirm it supports the features you need:

using LMKit.Model;

using LM model = new LM("path/to/custom-model.gguf");

Console.WriteLine($"Name: {model.Name}");
Console.WriteLine($"Parameters: {model.Parameters}");
Console.WriteLine($"Context length: {model.ContextLength}");
Console.WriteLine($"Chat: {model.Capabilities.HasFlag(ModelCapabilities.Chat)}");
Console.WriteLine($"Tool calling: {model.Capabilities.HasFlag(ModelCapabilities.ToolsCall)}");
Console.WriteLine($"Vision: {model.Capabilities.HasFlag(ModelCapabilities.Vision)}");

What model formats does LM-Kit.NET support?: Detailed explanation of GGUF, ONNX, and the LMK format used by catalog models.
How do I choose the right model size for my hardware?: Match your custom model's file size to available memory.
Model Catalog: Browse all pre-tested models with verified capabilities and download URIs.
Understanding Model Loading and Caching: How the download cache works and where model files are stored.

Table of Contents