Table of Contents

Can LM-Kit.NET Work with Models from Hugging Face?


TL;DR

Yes, directly. You can load any GGUF model from Hugging Face by passing its HTTPS URL to the LM constructor. LM-Kit.NET downloads the file, verifies its checksum, caches it locally, and loads it. The built-in model catalog also points to Hugging Face for all its downloads. However, only the GGUF format is supported. Models in safetensors, PyTorch, GPTQ, or AWQ format must be converted to GGUF first.


Loading a Model from Hugging Face

Pass the direct download URL of a GGUF file on Hugging Face:

using LMKit.Model;

// Load any GGUF model directly from Hugging Face
using LM model = new LM(new Uri(
    "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf"
));

The SDK handles:

  1. Downloading the file with progress callbacks and retry logic.
  2. Checksum verification against Hugging Face metadata (SHA256).
  3. Local caching so subsequent loads skip the download.
  4. Model loading into memory with GPU offloading if available.

You can track download progress with a callback:

using LM model = new LM(
    new Uri("https://huggingface.co/..."),
    downloadingProgress: (sender, progress) =>
    {
        Console.Write($"\rDownloading: {progress.ProgressPercentage:F1}%");
    }
);

The Built-In Model Catalog

The LM-Kit.NET model catalog is a curated set of models, all hosted on Hugging Face. Loading from the catalog uses the same download mechanism but with a simpler API:

// Load by model ID (downloads from Hugging Face automatically)
using LM model = LM.LoadFromModelID("qwen3.5:9b");

Catalog models use the .lmk file extension, which is standard GGUF with additional LM-Kit metadata (capability flags, recommended settings). Any .lmk file is fully GGUF-compatible.


Supported Architectures

LM-Kit.NET supports a wide range of GGUF model architectures:

Category Architectures
General text llama, qwen2, qwen3, gemma2, gemma3, gemma4, mistral3, phi3, deepseek2, falcon-h1, glm4, granite, smollm3, nomic-bert
Mixture of Experts qwen3 MoE variants
Vision qwen2vl, qwen3vl, paddleocr
Embeddings bert, gemma-embedding
Speech whisper
Specialized u2net (image segmentation)

If your model uses one of these architectures and is in GGUF format, it will work in LM-Kit.NET regardless of where you downloaded it.


Format Requirements

LM-Kit.NET requires GGUF format. Models in other formats need conversion before use:

Format How to Convert
safetensors Use llama.cpp's convert_hf_to_gguf.py tool
PyTorch (.bin, .pt) Use llama.cpp conversion tools
GPTQ Download or convert to GGUF quantization instead
AWQ Download or convert to GGUF quantization instead

All standard GGUF quantizations are supported: Q2_K through Q8_0, and F16/F32 for full precision.


Finding GGUF Models on Hugging Face

Many model publishers provide GGUF versions on Hugging Face. Look for:

  • Repositories with "GGUF" in the name (e.g., bartowski/Llama-3.2-3B-Instruct-GGUF)
  • Files ending in .gguf in the repository's file list
  • The Q4_K_M quantization as a good balance of quality and size

The URL format for direct download is:

https://huggingface.co/{publisher}/{repository}/resolve/main/{filename}.gguf

Version Pinning

To lock a specific model version, use the commit hash instead of main in the URL:

using LM model = new LM(new Uri(
    "https://huggingface.co/lm-kit/qwen3.5-9b-instruct-lmk/resolve/abc123def456/Qwen3.5-9B-Instruct-Q4_K_M.lmk"
));

This ensures you always load the exact same file, even if the publisher updates the repository.


Share