Table of Contents

Can I Use My Own GGUF Model Files with LM-Kit.NET?


TL;DR

Yes. LM-Kit.NET loads any GGUF-format model file, whether from the built-in catalog, downloaded from HuggingFace, or converted from other sources. You can load from a local file path, an HTTPS URL (auto-downloaded and cached), or the model catalog by ID. The model must use a supported architecture (Llama, Qwen, Gemma, Mistral, Phi, and many others).


Three Ways to Load a Model

1. From a local file

using LMKit.Model;

// Direct file path
using LM model = new LM("C:/models/my-custom-model-Q4_K_M.gguf");

// Or using a file:// URI
using LM model = new LM(new Uri("file:///C:/models/my-custom-model-Q4_K_M.gguf"));

2. From a HuggingFace URL (auto-download)

using LMKit.Model;

// First run: downloads and caches locally
// Subsequent runs: loads from cache
using LM model = new LM(new Uri(
    "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf"
));

3. From the built-in catalog

using LMKit.Model;

// By model ID (catalog handles the URI for you)
using LM model = LM.LoadFromModelID("qwen3.5:9b");

Supported Architectures

LM-Kit.NET supports a wide range of model architectures through its llama.cpp backend. If a GGUF file uses one of these architectures, it will load:

Category Architectures
General text Llama, Qwen 2/3/3.5, Gemma 2/3, Mistral 3, Phi 3, GPT-OSS, DeepSeek 2, Falcon H1, GLM 4, Granite, SmolLM 3, Nemotron
Mixture of Experts Qwen 3.5 MoE, Nemotron H MoE
Vision Qwen 2 VL, Qwen 3.5, MiniCPM-V, Pixtral, PaddleOCR
Embeddings BERT, Nomic-BERT, Gemma Embedding
Speech Whisper

This list grows with each release as new architectures are added to the underlying inference engine.


Quantization Formats

LM-Kit.NET supports all GGUF quantization formats. The most common ones you will encounter:

Format Bits Quality Speed Typical Use
Q4_K_M 4-bit Good Fast Default for all catalog models. Best general-purpose choice.
Q5_K_M 5-bit Better Slightly slower When you need slightly higher quality.
Q6_K 6-bit High Slower Near-original quality with larger file size.
Q8_0 8-bit Very high Slower, larger Maximum quality at 2x the size of Q4.
Q2_K 2-bit Lower Fastest Extreme compression for very constrained devices.
F16 16-bit Original Slowest, largest Full precision. Rarely needed for inference.

Using Models from Ollama or Other Tools

If you have GGUF files from Ollama, llama.cpp, GPT4All, or other tools, they work directly with LM-Kit.NET. Just point to the file:

// Ollama stores models as GGUF files in its cache directory
using LM model = new LM("/home/user/.ollama/models/blobs/sha256-abc123");

// Or any GGUF file you downloaded from HuggingFace manually
using LM model = new LM("/home/user/downloads/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf");

Verifying Model Capabilities After Loading

After loading a custom model, you can inspect its properties to confirm it supports the features you need:

using LMKit.Model;

using LM model = new LM("path/to/custom-model.gguf");

Console.WriteLine($"Name: {model.Name}");
Console.WriteLine($"Parameters: {model.Parameters}");
Console.WriteLine($"Context length: {model.ContextLength}");
Console.WriteLine($"Chat: {model.Capabilities.HasFlag(ModelCapabilities.Chat)}");
Console.WriteLine($"Tool calling: {model.Capabilities.HasFlag(ModelCapabilities.ToolsCall)}");
Console.WriteLine($"Vision: {model.Capabilities.HasFlag(ModelCapabilities.Vision)}");

Share