Table of Contents
📦 Loading Models
A model can be loaded by its Model ID:
var model = LM.LoadFromModelID("gemma3:4b");

LM-Kit Model Catalog

Filters 0
Showing 0 models
Model & ID Capabilities Context Params. Format License Download Details
BAAI bge m3bge-m3 Text Embeddings 8,192 0.57 B GGUF mit bge-m3-Q4_K_M.gguf Details →
BAAI bge m3 reranker v2bge-m3-reranker Text Reranking 8,192 0.57 B GGUF apache-2.0 Bge-M3-568M-Q4_K_M.gguf Details →
BAAI bge small en v1.5bge-small Text Embeddings 512 0.03 B GGUF mit bge-small-en-v1.5-f16.gguf Details →
DeepSeek Coder V2 Litedeepseek-coder-v2:16b Code Completion 163,840 15.71 B GGUF deepseek DeepSeek-Coder-2-Lite-15.7B-Instruct-Q4_K_M.gguf Details →
DeepSeek R1 Distill Llamadeepseek-r1:8b Text Generation Chat Code Completion Math 131,072 8.03 B GGUF mit DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf Details →
Google Gemma Embedding 300Membeddinggemma-300m Text Embeddings 2,058 0.03 B GGUF gemma embeddinggemma-300M-Q4_K_M.gguf Details →
TII Falcon 3 Instructfalcon3:3b Text Generation Chat Code Completion Math 32,768 3.23 B GGUF falcon-llm-license Falcon3-3B-Instruct-q4_k_m.gguf Details →
TII Falcon 3 Instructfalcon3:7b Text Generation Chat Code Completion Math 32,768 7.62 B GGUF falcon-llm-license Falcon-3-7.6B-Instruct-Q4_K_M.gguf Details →
TII Falcon 3 Instructfalcon3:10b Text Generation Chat Code Completion Math 32,768 10.31 B GGUF falcon-llm-license Falcon3-10B-Instruct-q4_k_m.gguf Details →
Google Gemma 2gemma2:2b Text Generation Chat 8,192 2.61 B GGUF gemma gemma-2-2B-Q4_K_M.gguf Replaced by
gemma3:1b
Google Gemma 2gemma2:9b Text Generation Chat 8,192 9.24 B GGUF gemma gemma-2-9B-Q4_K_M.gguf Replaced by
gemma3:4b
Google Gemma 2gemma2:27b Text Generation Chat 8,192 27.23 B GGUF gemma gemma-2-27B-Q4_K_M.gguf Replaced by
gemma3:27b
Google Gemma 3gemma3:1b Text Generation Chat 32,768 1.00 B GGUF gemma gemma-3-it-1B-Q4_K_M.gguf Details →
Google Gemma 3gemma3:4b Text Generation Chat Code Completion Math Vision 131,072 3.88 B GGUF gemma gemma-3-4b-it-Q4_K_M.lmk Details →
Google Gemma 3gemma3:12b Text Generation Chat Code Completion Math Vision 131,072 11.77 B GGUF gemma gemma-3-12b-it-Q4_K_M.lmk Details →
Google Gemma 3gemma3:27b Text Generation Chat Code Completion Math Vision 131,072 27.01 B GGUF gemma gemma-3-27b-it-Q4_K_M.lmk Details →
Google Gemma 3 270Mgemma3:270m Text Generation Chat 32,768 0.27 B GGUF gemma gemma-3-270M-it-Q4_K_M.gguf Details →
OpenAI Gpt OSSgptoss:20b Text Generation Chat Code Completion Math Reasoning Tools Call 131,072 20.91 B GGUF apache-2.0 gpt-oss-20b-mxfp4.gguf Details →
IBM Granite 3.1 Dense Instructgranite3.1-dense:2b Text Generation Chat Code Completion 131,072 2.53 B GGUF apache-2.0 granite-3.1-2.5B-Q4_K_M.gguf Replaced by
granite4-h:3b
IBM Granite 3.1 Dense Instructgranite3.1-dense:8b Text Generation Chat Code Completion 131,072 8.17 B GGUF apache-2.0 granite-3.1-8.2B-Q4_K_M.gguf Replaced by
granite4-h:7b
IBM Granite 3.3 Instructgranite3.3:2b Text Generation Chat Code Completion 131,072 2.53 B GGUF apache-2.0 granite-3.3-8B-Instruct-Q4_K_M.gguf Replaced by
granite4-h:3b
IBM Granite 3.3 Instructgranite3.3:8b Text Generation Chat Code Completion 131,072 8.17 B GGUF apache-2.0 granite-3.3-2B-Instruct-Q4_K_M.gguf Replaced by
granite4-h:7b
IBM Granite 4 Micro Instructgranite4-h:3b Text Generation Chat Code Completion Tools Call 1,048,576 3.19 B GGUF apache-2.0 Granite-4.0-H-Micro-3.2B-Q4_K_M.gguf Details →
IBM Granite 4 Tiny Instructgranite4-h:7b Text Generation Chat Code Completion Tools Call 1,048,576 6.94 B GGUF apache-2.0 Granite-4.0-H-Tiny-64x994M-Q4_K_M.gguf Details →
LightOn LightOnOCR 1025lightonocr1025:1b Text Generation Chat 8,192 1.16 B GGUF apache-2.0 lightonocr-1b-1025-Q4_K_M.lmk Details →
Meta Llama 3.1 Instructllama3.1 Text Generation Chat Tools Call 131,072 8.03 B GGUF llama3.1 Llama-3.1-8B-Instruct-Q4_K_M.gguf Details →
Meta Llama 3.2 Instructllama3.2:1b Text Generation Chat 131,072 1.24 B GGUF llama3.2 Llama-3.2-1B-Instruct-Q4_K_M.gguf Details →
Meta Llama 3.2 Instructllama3.2:3b Text Generation Chat 131,072 3.21 B GGUF llama3.2 Llama-3.2-3B-Instruct-Q4_K_M.gguf Details →
Meta Llama 3.3 Instructllama3.3 Text Generation Chat Code Completion Math Tools Call 131,072 70.55 B GGUF llama3.3 Llama-3.3-70B-Instruct-Q4_K_M.gguf Details →
LM-Kit Sarcasm Detection V1lmkit-sarcasm-detection Sentiment Analysis 2,048 1.10 B GGUF lm-kit LM-Kit.Sarcasm_Detection-TinyLlama-1.1B-1T-OpenOrca-en-q4.gguf Details →
LM-Kit Sentiment Analysis V2lmkit-sentiment-analysis Sentiment Analysis 131,072 1.24 B GGUF lm-kit lm-kit-sentiment-analysis-2.0-1b-q4.gguf Details →
LM-Kit Tasks Previewlmkit-tasks:4b-preview Text Generation Chat Code Completion Math Vision 131,072 3.88 B LMK lmkit lmkit-tasks-4b-preview.lmk Details →
Mistral Magistral Small 1.1magistral-small Text Generation Chat Code Completion Math Reasoning 40,960 23.57 B GGUF apache-2.0 Magistral-Small-2506-Q4_K_M.gguf Replaced by
magistral-small1.2
Mistral Magistral Small 1.2magistral-small1.2 Text Generation Chat Math Reasoning Tools Call 40,960 23.57 B GGUF apache-2.0 Magistral-Small-2509-Q4_K_M.gguf Details →
OpenBMB MiniCPM o 2.6 Visionminicpm-o Text Generation Chat Vision 32,768 8.12 B LMK OpenBMB MiniCPM-o-V-2.6-Q4_K_M.lmk Details →
OpenBMB MiniCPM 2.6 Visionminicpm-v Text Generation Chat Vision 32,768 8.12 B LMK OpenBMB MiniCPM-V-2.6-Q4_K_M.lmk Replaced by
minicpm-o
OpenBMB MiniCPM-V 4.5minicpm-v-45 Text Generation Chat Vision 40,960 8.72 B LMK OpenBMB minicpm-v-4.5-8b.lmk Details →
Mistral Ministral 3ministral3:3b Text Generation Chat Math Vision Tools Call 262,144 3.85 B LMK apache-2.0 ministral-3-3b-instruct-Q4_K_M.lmk Details →
Mistral Ministral 3ministral3:8b Text Generation Chat Math Vision Tools Call 262,144 8.92 B LMK apache-2.0 ministral-3-8b-instruct-Q4_K_M.lmk Details →
Mistral Ministral 3ministral3:14b Text Generation Chat Math Vision Tools Call 262,144 13.95 B LMK apache-2.0 ministral-3-14b-instruct-Q4_K_M.lmk Details →
Mistral Nemo Instruct 2407mistral-nemo Text Generation Chat 1,024,000 12.25 B GGUF apache-2.0 Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf Replaced by
ministral3:8b
Mistral Small Instruct 2501mistral-small Text Generation Chat Code Completion Math 32,768 23.57 B GGUF apache-2.0 Mistral-Small-Instruct-2501-24B-Q4_K_M.gguf Replaced by
mistral-small3.2
Mistral Small 3.1 Instruct 2503mistral-small3.1 Text Generation Chat Code Completion Math 131,072 23.57 B GGUF apache-2.0 Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf Replaced by
mistral-small3.2
Mistral Small 3.2 Instruct 2503mistral-small3.2 Text Generation Chat Code Completion Math Tools Call 131,072 23.57 B GGUF apache-2.0 Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf Details →
Nomic embed text v1.5nomic-embed-text Text Embeddings 2,048 0.14 B GGUF apache-2.0 nomic-embed-text-1.5-Q4_K_M.gguf Details →
Nomic embed vision v1.5nomic-embed-vision Image Embeddings 197 0.09 B ONNX apache-2.0 nomic-embed-vision-1.5-Q8.lmk Details →
Microsoft Phi 3.5 Mini Instructphi3.5 Text Generation Chat 131,072 3.82 B GGUF mit Phi-3.5-mini-Instruct-Q4_K_M.gguf Replaced by
phi4-mini
Microsoft Phi 4 Instructphi4 Text Generation Chat Math Tools Call 16,384 14.66 B GGUF mit Phi-4-14.7B-Instruct-Q4_K_M.gguf Details →
Microsoft Phi 4 Mini Instructphi4-mini Text Generation Chat Tools Call 131,072 3.84 B GGUF mit Phi-4-mini-Instruct-Q4_K_M.gguf Details →
Mistral Pixtralpixtral Text Generation Chat Vision 1,024,000 12.68 B LMK apache-2.0 pixtral-12B-Q4_K_M.lmk Details →
Alibaba Qwen 2 Vision Instructqwen2-vl:2b Text Generation Chat Vision 32,768 2.21 B LMK apache-2.0 Qwen2-VL-2B-Instruct-Q4_K_M.lmk Replaced by
qwen3-vl:2b
Alibaba Qwen 2 Vision Instructqwen2-vl:8b Text Generation Chat Vision 32,768 8.29 B LMK apache-2.0 Qwen2-VL-8.3B-Instruct-Q4_K_M.lmk Replaced by
qwen3-vl:8b
Alibaba Qwen 2.5 Instructqwen2.5:0.5b Text Generation Chat 32,768 0.49 B GGUF apache-2.0 Qwen-2.5-0.5B-Instruct-Q4_K_M.gguf Replaced by
qwen3:0.6b
Alibaba Qwen 2.5 Instructqwen2.5:3b Text Generation Chat 32,768 3.09 B GGUF qwen-research Qwen-2.5-3.1B-Instruct-Q4_K_M.gguf Replaced by
qwen3:4b
Alibaba Qwen 2.5 Instructqwen2.5:7b Text Generation Chat 32,768 7.62 B GGUF apache-2.0 Qwen-2.5-7B-Instruct-Q4_K_M.gguf Replaced by
qwen3:8b
Alibaba Qwen 2.5 Vision Instructqwen2.5-vl:3b Text Generation Chat Vision 128,000 3.75 B LMK qwen research license Qwen2.5-VL-3B-Instruct-Q4_K_M.lmk Replaced by
qwen3-vl:4b
Alibaba Qwen 2.5 Vision Instructqwen2.5-vl:7b Text Generation Chat Vision 128,000 8.29 B LMK apache-2.0 Qwen2.5-VL-7B-Instruct-Q4_K_M.lmk Replaced by
qwen3-vl:8b
Alibaba Qwen 2.5 Vision Instructqwen2.5-vl:32b Text Generation Chat Vision 128,000 33.45 B LMK apache-2.0 Qwen2.5-VL-32B-Instruct-Q4_K_M.lmk Details →
Alibaba Qwen 3 Instructqwen3:0.6b Text Generation Chat 40,960 0.75 B GGUF apache-2.0 Qwen3-0.6B-Q4_K_M.gguf Details →
Alibaba Qwen 3 Instructqwen3:1.7b Text Generation Chat 40,960 2.03 B GGUF apache-2.0 Qwen3-1.7B-Q4_K_M.gguf Details →
Alibaba Qwen 3 Instructqwen3:4b Text Generation Chat Math Reasoning Tools Call 40,960 4.02 B GGUF apache-2.0 Qwen3-4B-Q4_K_M.gguf Details →
Alibaba Qwen 3 Instructqwen3:8b Text Generation Chat Math Reasoning Tools Call 40,960 8.19 B GGUF apache-2.0 Qwen3-8B-Q4_K_M.gguf Details →
Alibaba Qwen 3 Instructqwen3:14b Text Generation Chat Math Reasoning Tools Call 40,960 14.77 B GGUF apache-2.0 Qwen3-14B-Q4_K_M.gguf Details →
Alibaba Qwen 3 Embeddingqwen3-embedding:0.6b Text Embeddings 32,768 0.60 B GGUF apache-2.0 Qwen3-Embedding-0.6B-Q4_K_M.gguf Details →
Alibaba Qwen 3 Embeddingqwen3-embedding:4b Text Embeddings 40,960 4.02 B GGUF apache-2.0 Qwen3-Embedding-4B-Q4_K_M.gguf Details →
Alibaba Qwen 3 Embeddingqwen3-embedding:8b Text Embeddings 40,960 7.57 B GGUF apache-2.0 Qwen3-Embedding-8B-Q4_K_M.gguf Details →
Alibaba Qwen 3 Vision Instructqwen3-vl:2b Text Generation Chat Code Completion Math Vision Tools Call 262,144 2.13 B LMK apache-2.0 qwen3-vl-2b-instruct-Q4_K_M.lmk Details →
Alibaba Qwen 3 Vision Instructqwen3-vl:4b Text Generation Chat Code Completion Math Vision Tools Call 262,144 2.13 B LMK apache-2.0 qwen3-vl-4b-instruct-Q4_K_M.lmk Details →
Alibaba Qwen 3 Vision Instructqwen3-vl:8b Text Generation Chat Code Completion Math Vision Tools Call 262,144 8.77 B LMK apache-2.0 qwen3-vl-8b-instruct-Q4_K_M.lmk Details →
Alibaba Qwen 3 Vision Instructqwen3-vl:30b Text Generation Chat Code Completion Math Vision Tools Call 262,144 31.07 B LMK apache-2.0 qwen3-vl-30b-instruct-Q4_K_M.lmk Details →
Alibaba Qwen QwQqwq Text Generation Chat Math Reasoning Tools Call 40,960 32.76 B GGUF apache-2.0 QwQ-32B-Q4_K_M.gguf Details →
HuggingFace SmolLM3smollm3:3b Text Generation Chat Code Completion Math 65,536 3.08 B GGUF apache-2.0 SmolLM3-3B-Q4_K_M.gguf Details →
U2-Net 44Mu2net Image Segmentation 0 0.04 B LMK apache-2.0 u2-net-F32.lmk Details →
OpenAI Whisper Basewhisper-base Speech-to-Text 1,500 0.07 B GGML mit whisper-base-q8_0.bin Details →
OpenAI Whisper Large Turbo V3whisper-large-turbo3 Speech-to-Text 1,500 0.81 B GGML mit whisper-large-v3-turbo-q8_0.bin Details →
OpenAI Whisper Large V3whisper-large3 Speech-to-Text 1,500 1.54 B GGML mit whisper-large-v3-q8_0.bin Details →
OpenAI Whisper Mediumwhisper-medium Speech-to-Text 1,500 0.76 B GGML mit whisper-medium-q8_0.bin Details →
OpenAI Whisper Smallwhisper-small Speech-to-Text 1,500 0.24 B GGML mit whisper-small-q8_0.bin Details →
OpenAI Whisper Tinywhisper-tiny Speech-to-Text 1,500 0.04 B GGML mit whisper-tiny-q8_0.bin Details →

Model Details

BAAI bge m3bge-m3

Description: A unified, multilingual embedding model that delivers dense, sparse, and multi-vector retrieval on texts from short queries up to 8,192-token documents in over 100 languages.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: bert
  • Context Length: 8,192 tokens
  • Parameter Count: 566,703,104
  • Quantization Precision: 4-bit
  • File Size: 417.50 MB
  • Format: GGUF
  • License: mit
  • SHA256: e251234fcb7d050991a6be491952f485bf5c641dd10c3272dc1301fd281ad50f
BAAI bge m3 reranker v2bge-m3-reranker

Description: A unified, multilingual reranker that ingests query–document pairs and directly produces sigmoid-normalized relevance scores across over 100 languages.

Specifications:

  • Capabilities: Text Reranking
  • Architecture: bert
  • Context Length: 8,192 tokens
  • Parameter Count: 567,753,729
  • Quantization Precision: 4-bit
  • File Size: 418.07 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: ce947cece730cbf7d836da8c5490a9987ef0f919014b9275e7ce9aa12d96e6d9
BAAI bge small en v1.5bge-small

Description: An efficient, CPU-friendly English embedding model (BAAI General Embedding) designed for lightweight applications.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: bert
  • Context Length: 512 tokens
  • Parameter Count: 33,212,160
  • Quantization Precision: 16-bit
  • File Size: 64.45 MB
  • Format: GGUF
  • License: mit
  • SHA256: cd5790da23df71e7e20fe20bb523bd4586a533a4ee813cc562e32b37929141c1
DeepSeek Coder V2 Lite 15.7Bdeepseek-coder-v2:16b

Description: An open-source mixture-of-experts code model tailored for code completion tasks. Early evaluations indicated competitive performance relative to leading code models.

Specifications:

  • Capabilities: Code Completion
  • Architecture: deepseek2
  • Context Length: 163,840 tokens
  • Parameter Count: 15,706,484,224
  • Quantization Precision: 4-bit
  • File Size: 9.65 GB
  • Format: GGUF
  • License: deepseek
  • SHA256: ac398e8c1c670d3c362d3c1182614916bab7c364708ec073fcf947f6802d509e
DeepSeek R1 Distill Llama 8Bdeepseek-r1:8b

Description: DeepSeek-R1 enhances its predecessor by integrating cold-start data to overcome repetition and readability issues, achieving state-of-the-art performance in math, code, and reasoning tasks, with all models open-sourced.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 131,072 tokens
  • Parameter Count: 8,030,261,312
  • Quantization Precision: 4-bit
  • File Size: 4.58 GB
  • Format: GGUF
  • License: mit
  • SHA256: 596fce705423e44831fe63367a30ccc7b36921c1bfdd4b9dfde85a5aa97ac2ef
Google Gemma Embedding 300Membeddinggemma-300m

Description: EmbeddingGemma 300M is an open, state-of-the-art-for-its-size embedding model from Google DeepMind (Gemma 3, T5Gemma-initialized). It produces 768-dimensional text vectors for search/retrieval, classification, clustering, and semantic similarity across 100+ languages, and supports Matryoshka Representation Learning (truncate to 512/256/128 with re-normalization). Optimized for on-device/CPU deployment.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: gemma-embedding
  • Context Length: 2,058 tokens
  • Parameter Count: 33,212,160
  • Quantization Precision: 4-bit
  • File Size: 288.83 MB
  • Format: GGUF
  • License: gemma
  • SHA256: 3d55e7fe66eb4c7b2d01b4fbd30c00dc7a101bd6c9f724a6e7e5cfaa87968420
TII Falcon 3 Instruct 3.2Bfalcon3:3b

Description: Designed for multilingual tasks including chat, text generation, and code completion, supporting extended context lengths.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 32,768 tokens
  • Parameter Count: 3,227,655,168
  • Quantization Precision: 4-bit
  • File Size: 1.87 GB
  • Format: GGUF
  • License: falcon-llm-license
  • SHA256: 81c6b52d221c2f0eea3db172fc74de28534f2fd15f198ecbfcc55577d20cbf8a
TII Falcon 3 Instruct 7.6Bfalcon3:7b

Description: Offers robust performance across chat, text generation, and mathematical reasoning tasks with extended context support.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 32,768 tokens
  • Parameter Count: 7,615,616,512
  • Quantization Precision: 4-bit
  • File Size: 4.26 GB
  • Format: GGUF
  • License: falcon-llm-license
  • SHA256: 4ce1da546d76e04ce77eb076556eb25e1096faf6155ee429245e4bfa3f5ddf5d
TII Falcon 3 Instruct 10.3Bfalcon3:10b

Description: A larger variant tailored for multilingual dialogue, code completion, and complex reasoning tasks with extended context support.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 32,768 tokens
  • Parameter Count: 10,305,653,760
  • Quantization Precision: 4-bit
  • File Size: 5.86 GB
  • Format: GGUF
  • License: falcon-llm-license
  • SHA256: a0c0edbd35019ff26d972a0373b25b4c8d72315395a3b6036aca5e6bafa3d819
Google Gemma 2 2.6Bgemma2:2b

Description: A lightweight decoder-only model from Google, available in both pre-trained and instruction-tuned variants for text-to-text tasks.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: gemma2
  • Context Length: 8,192 tokens
  • Parameter Count: 2,614,341,888
  • Quantization Precision: 4-bit
  • File Size: 1.59 GB
  • Format: GGUF
  • License: gemma
  • SHA256: 362d09c1496e035ecf0737d8fe03e8e607c61e57e16b22cedd158525f6721e06

⚠️ Replaced by: gemma3:1b

Google Gemma 2 9.2Bgemma2:9b

Description: A decoder-only text-to-text model from Google, offering competitive performance in both pre-trained and instruction-tuned configurations.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: gemma2
  • Context Length: 8,192 tokens
  • Parameter Count: 9,241,705,984
  • Quantization Precision: 4-bit
  • File Size: 5.37 GB
  • Format: GGUF
  • License: gemma
  • SHA256: b6059a960d2f4f881630f1e795b40f7e09e5e12d3a6b1900474d6108ea880afd

⚠️ Replaced by: gemma3:4b

Google Gemma 2 27.2Bgemma2:27b

Description: A larger variant in the Gemma 2 family, optimized for text generation and instruction following with open weights provided.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: gemma2
  • Context Length: 8,192 tokens
  • Parameter Count: 27,227,128,320
  • Quantization Precision: 4-bit
  • File Size: 15.50 GB
  • Format: GGUF
  • License: gemma
  • SHA256: bb4b276745da743d550720dc2e6c847498eef45e7b82a4d5a73ef6636f78027a

⚠️ Replaced by: gemma3:27b

Google Gemma 3 1Bgemma3:1b

Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: gemma3
  • Context Length: 32,768 tokens
  • Parameter Count: 999,885,952
  • Quantization Precision: 4-bit
  • File Size: 768.72 MB
  • Format: GGUF
  • License: gemma
  • SHA256: bacfe3de6eee9fba412d5c0415630172c2a602dae26bb353e1b20dd67194a226
Google Gemma 3 3.9Bgemma3:4b

Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Vision
  • Architecture: gemma3
  • Context Length: 131,072 tokens
  • Parameter Count: 3,880,099,328
  • Quantization Precision: 4-bit
  • File Size: 2.87 GB
  • Format: GGUF
  • License: gemma
  • SHA256: abb283e96c0abf58468a18127ce6e8b2bfc98e48f1ec618f658495c09254bdae
Google Gemma 3 11.8Bgemma3:12b

Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Vision
  • Architecture: gemma3
  • Context Length: 131,072 tokens
  • Parameter Count: 11,765,788,416
  • Quantization Precision: 4-bit
  • File Size: 7.35 GB
  • Format: GGUF
  • License: gemma
  • SHA256: d6f01cdb4369769ea87c5211a7fd865e12dbb9e2a937b43ef281a5b7e9ba2e35
Google Gemma 3 27.2Bgemma3:27b

Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Vision
  • Architecture: gemma3
  • Context Length: 131,072 tokens
  • Parameter Count: 27,009,002,240
  • Quantization Precision: 4-bit
  • File Size: 15.97 GB
  • Format: GGUF
  • License: gemma
  • SHA256: 2d0e4382259ae2da28b9c0342e982a58eafbddad7c05bbfe6e104f2b3c165994
Google Gemma 3 270Mgemma3:270m

Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: gemma3
  • Context Length: 32,768 tokens
  • Parameter Count: 268,098,176
  • Quantization Precision: 4-bit
  • File Size: 241.39 MB
  • Format: GGUF
  • License: gemma
  • SHA256: e28b323bc75925d6edc8d3f030268830bf53c59c296d77278ac24653403d9d47
OpenAI Gpt OSSgptoss:20b

Description: OpenAI’s medium-sized open-weight Mixture-of-Experts model (≈21B params; ~3.6B active per token). This MXFP4 GGUF build is optimized for local inference, supports long context (131k), strong reasoning & tool use, and can run on consumer GPUs (~16GB VRAM).

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Reasoning Tools Call
  • Architecture: gpt-oss
  • Context Length: 131,072 tokens
  • Parameter Count: 20,914,757,184
  • Quantization Precision: 4-bit
  • File Size: 11.28 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 52f57ab7d3df3ba9173827c1c6832e73375553a846f3e32b49f1ae2daad688d4
IBM Granite 3.1 Dense Instruct 2.5Bgranite3.1-dense:2b

Description: A long-context instruct model finetuned with a mix of open source and synthetic datasets. Designed for dialogue and text generation tasks.

Specifications:

  • Capabilities: Text Generation Chat Code Completion
  • Architecture: granite
  • Context Length: 131,072 tokens
  • Parameter Count: 2,533,531,648
  • Quantization Precision: 4-bit
  • File Size: 1.44 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: ba05b36d0a8cebf8ccd13bbbb904bebe182f4854fbcff19cd1ee54bc82bbd298

⚠️ Replaced by: granite4-h:3b

IBM Granite 3.1 Dense Instruct 8.2Bgranite3.1-dense:8b

Description: An extended-context model optimized for dialogue and code completion tasks. Developed with diverse training data to enhance long-context understanding.

Specifications:

  • Capabilities: Text Generation Chat Code Completion
  • Architecture: granite
  • Context Length: 131,072 tokens
  • Parameter Count: 8,170,848,256
  • Quantization Precision: 4-bit
  • File Size: 4.60 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: d1ada98d7b274fc6b119bd19b8d3536cd006544e9aae06db6f8b2c256d584044

⚠️ Replaced by: granite4-h:7b

IBM Granite 3.3 Instruct 2.5Bgranite3.3:2b

Description: A long-context instruct model finetuned with a mix of open source and synthetic datasets. Designed for dialogue and text generation tasks.

Specifications:

  • Capabilities: Text Generation Chat Code Completion
  • Architecture: granite
  • Context Length: 131,072 tokens
  • Parameter Count: 2,533,539,840
  • Quantization Precision: 4-bit
  • File Size: 1.44 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: dbe4dd51bd6c1e39f96c831bf086454c9b313bd1c279ebb7166f2a37d86598da

⚠️ Replaced by: granite4-h:3b

IBM Granite 3.3 Instruct 8.2Bgranite3.3:8b

Description: An extended-context model optimized for dialogue and code completion tasks. Developed with diverse training data to enhance long-context understanding.

Specifications:

  • Capabilities: Text Generation Chat Code Completion
  • Architecture: granite
  • Context Length: 131,072 tokens
  • Parameter Count: 8,170,864,640
  • Quantization Precision: 4-bit
  • File Size: 4.60 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 1c890e740d7ecb010716a858eda315c01ac5bb0edfaf68bf17118868a26bb8ff

⚠️ Replaced by: granite4-h:7b

IBM Granite 4 Micro Instruct 3.2Bgranite4-h:3b

Description: Hybrid long-context instruct model (Mamba-2 + attention) finetuned from Granite-4.0-H-Micro-Base with SFT, RL alignment, and model merging. Delivers stronger instruction following and robust tool/function calling in multilingual dialog (en, de, es, fr, ja, pt, ar, cs, it, ko, nl, zh), with 1M-token context for enterprise assistants. Excels at summarization, classification, extraction, QA/RAG, and code—including FIM—and supports structured chat templates and OpenAI-style tool schemas.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Tools Call
  • Architecture: granitehybrid
  • Context Length: 1,048,576 tokens
  • Parameter Count: 3,191,396,096
  • Quantization Precision: 4-bit
  • File Size: 1.81 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: dbe7b747aa49340f80629811652636b55f4ca4cbbb92ee7e17c442d8a1130566
IBM Granite 4 Tiny Instruct 6.9Bgranite4-h:7b

Description: Granite-4.0-H-Tiny is an ~7B-parameter hybrid (attention + Mamba2) MoE decoder with a 1M-token context, instruction-tuned (SFT, RL alignment, model merging) for enterprise assistants. It improves instruction following and tool/function calling, supports multilingual dialog (en, de, es, fr, ja, pt, ar, cs, it, ko, nl, zh), and excels at summarization, classification, extraction, QA/RAG, and code/FIM tasks.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Tools Call
  • Architecture: granitehybrid
  • Context Length: 1,048,576 tokens
  • Parameter Count: 6,939,037,248
  • Quantization Precision: 4-bit
  • File Size: 3.94 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 75234a50a38235dd4c891dae1a702ccf47a5d89da751d38f90a43be4794f18fb
LightOn LightOnOCR 1025 1Blightonocr1025:1b

Description: LightOnOCR-1B-1025 is a compact, end-to-end vision–language model for high-accuracy OCR and document understanding, delivering fast, layout-aware text extraction from complex documents.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: qwen3
  • Context Length: 8,192 tokens
  • Parameter Count: 1,161,230,336
  • Quantization Precision: 4-bit
  • File Size: 710.36 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 42646c9408fd59f342dec352bfc3c84c2255b03e79500506ee9d320e2f66be37
Meta Llama 3.1 Instruct 8Bllama3.1

Description: A multilingual generative model optimized for dialogue and text generation tasks. Designed for robust performance on common benchmarks.

Specifications:

  • Capabilities: Text Generation Chat Tools Call
  • Architecture: llama
  • Context Length: 131,072 tokens
  • Parameter Count: 8,030,261,312
  • Quantization Precision: 4-bit
  • File Size: 4.58 GB
  • Format: GGUF
  • License: llama3.1
  • SHA256: ad00fe50a62d1e009b4e06cd57ab55c9a30cbf5e7f183de09115d75ada73bd5b
Meta Llama 3.2 Instruct 1.2Bllama3.2:1b

Description: A multilingual instruct-tuned model optimized for dialogue, retrieval, and summarization tasks.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: llama
  • Context Length: 131,072 tokens
  • Parameter Count: 1,235,814,432
  • Quantization Precision: 4-bit
  • File Size: 770.28 MB
  • Format: GGUF
  • License: llama3.2
  • SHA256: 88725e821cf35f1a0dbeaa4a3bebeb91e6c6b6a9d50f808ab42d64233284cce1
Meta Llama 3.2 Instruct 3.2Bllama3.2:3b

Description: A multilingual dialogue model with robust text generation and summarization capabilities.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: llama
  • Context Length: 131,072 tokens
  • Parameter Count: 3,212,749,888
  • Quantization Precision: 4-bit
  • File Size: 1.88 GB
  • Format: GGUF
  • License: llama3.2
  • SHA256: 6810bf3cce69d440a22b85a3b3e28f57c868f1c98686abd995f1dc5d9b955cfe
Meta Llama 3.3 Instruct 70.6Bllama3.3

Description: A large multilingual generative model optimized for dialogue, text tasks, code completion, and mathematical reasoning with extended context support.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Tools Call
  • Architecture: llama
  • Context Length: 131,072 tokens
  • Parameter Count: 70,553,706,560
  • Quantization Precision: 4-bit
  • File Size: 39.60 GB
  • Format: GGUF
  • License: llama3.3
  • SHA256: 57f78fe3b141afa56406278265656524c51c9837edb3537ad43708b6d4ecc04d
LM-Kit Sarcasm Detection V1 1.1Blmkit-sarcasm-detection

Description: Optimized for detecting sarcasm in English text within the LM-Kit framework. Suitable for CPU-based inference.

Specifications:

  • Capabilities: Sentiment Analysis
  • Architecture: llama
  • Context Length: 2,048 tokens
  • Parameter Count: 1,100,048,384
  • Quantization Precision: 4-bit
  • File Size: 636.88 MB
  • Format: GGUF
  • License: lm-kit
  • SHA256: cc82abd224dba9c689b19d368db6078d6167ca84897b21870d7d6a2c0f09d7d0
LM-Kit Sentiment Analysis V2 1.2Blmkit-sentiment-analysis

Description: Designed for multilingual sentiment analysis tasks, this LM-Kit model is optimized for efficient CPU-based inference.

Specifications:

  • Capabilities: Sentiment Analysis
  • Architecture: llama
  • Context Length: 131,072 tokens
  • Parameter Count: 1,235,814,432
  • Quantization Precision: 4-bit
  • File Size: 770.28 MB
  • Format: GGUF
  • License: lm-kit
  • SHA256: e12f4abf6453a8431985ce1d6350c265cd58b25210156a917e3608c850fd7add
LM-Kit Tasks 4B Previewlmkit-tasks:4b-preview

Description: A 4B-parameter Gemma3-based model optimized for LM-Kit tasks. Achieves state-of-the-art performance in classification, structured data extraction, language detection, and sentiment analysis, while also supporting chat, embeddings, text generation, code completion, math reasoning, and vision understanding. Designed for seamless integration into LM-Kit pipelines to deliver efficient, reliable, and high-quality results across domains.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Vision
  • Architecture: gemma3
  • Context Length: 131,072 tokens
  • Parameter Count: 3,880,099,328
  • Quantization Precision: 4-bit
  • File Size: 3.09 GB
  • Format: LMK
  • License: lmkit
  • SHA256: 3ec9fe4622e2d9a050b3d2c7d2244a911aab75372b04a7bc30bb72a05bdd645c
Mistral Magistral Small 1.1 24Bmagistral-small

Description: Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Reasoning
  • Architecture: llama
  • Context Length: 40,960 tokens
  • Parameter Count: 23,572,403,200
  • Quantization Precision: 4-bit
  • File Size: 13.35 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 7680ba6895d405340f1461cb835a147055689a37d88b193cc5a365aaea76da9e

⚠️ Replaced by: magistral-small1.2

Mistral Magistral Small 1.2 24Bmagistral-small1.2

Description: Magistral Small 1.2 (2509) builds upon Mistral Small 3.2 (2506) with added reasoning via SFT from Magistral Medium traces and RL, special [THINK]/[/THINK] tokens, and a 128K context. This GGUF release is text-only (no vision encoder) and should be paired with mistral-common for the correct chat template.

Specifications:

  • Capabilities: Text Generation Chat Math Reasoning Tools Call
  • Architecture: llama
  • Context Length: 40,960 tokens
  • Parameter Count: 23,572,403,200
  • Quantization Precision: 4-bit
  • File Size: 13.35 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: d3a024d29e0e8f35d9353f5d4f08fc3715406835c8ae1328ce2f3bd212a43434
OpenBMB MiniCPM o 2.6 Vision 8.1Bminicpm-o

Description: An end-to-end multimodal model supporting real-time speech, image, and text understanding. Offers enhanced performance for conversational tasks.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: qwen2
  • Context Length: 32,768 tokens
  • Parameter Count: 8,116,736,752
  • Quantization Precision: 4-bit
  • File Size: 5.00 GB
  • Format: LMK
  • License: OpenBMB
  • SHA256: 6fd17ed1f46bfcddb5a3482dd882dd022a46aa8c33cb93d75f809cd4d118ab53
OpenBMB MiniCPM 2.6 Vision 8.1Bminicpm-v

Description: A multimodal model designed for vision and text tasks, built upon SigLip and Qwen architectures. Evaluate performance against current benchmarks.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: qwen2
  • Context Length: 32,768 tokens
  • Parameter Count: 8,116,736,752
  • Quantization Precision: 4-bit
  • File Size: 5.00 GB
  • Format: LMK
  • License: OpenBMB
  • SHA256: a10b1aa434899ea0bd5bb5e281f622fed0b02434241d53435fce05773fa7cfa8

⚠️ Replaced by: minicpm-o

OpenBMB MiniCPM-V 4.5 8Bminicpm-v-45

Description: MiniCPM-V 4.5 is a state-of-the-art multimodal LLM built on Qwen3-8B and SigLIP2-400M. It delivers GPT-4o-level performance for single-image, multi-image, and high-FPS video understanding on local devices. The model supports controllable fast/deep thinking, real-time speech and text comprehension, strong OCR and document parsing (up to 1.8M pixels), and multilingual capabilities in 30+ languages. Optimized for efficiency, it enables CPU inference, mobile deployment, and scalable usage through formats like LMK, GGUF, and AWQ.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: qwen3
  • Context Length: 40,960 tokens
  • Parameter Count: 8,715,965,680
  • Quantization Precision: 4-bit
  • File Size: 5.70 GB
  • Format: LMK
  • License: OpenBMB
  • SHA256: 000c56809f033e53637f364461cfadb8c4aa09e533a3fde66de39cbb41bf5cb7
Mistral Ministral 3 3Bministral3:3b

Description: Smallest member of the Ministral 3 family, an edge-optimized multilingual instruct model with a 256K context window and solid reasoning and code capabilities for constrained hardware.

Specifications:

  • Capabilities: Text Generation Chat Math Vision Tools Call
  • Architecture: mistral3
  • Context Length: 262,144 tokens
  • Parameter Count: 3,849,093,120
  • Quantization Precision: 4-bit
  • File Size: 2.42 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 300ce4373c14a3c1d68e37a7f4537b98776eb8ccac50ff32b30cc8ae6191ee96
Mistral Ministral 3 8Bministral3:8b

Description: Mid-sized Ministral 3 variant that balances quality and cost, offering strong multilingual reasoning, math, and code performance with a 256K context while remaining practical for single-GPU and edge deployments.

Specifications:

  • Capabilities: Text Generation Chat Math Vision Tools Call
  • Architecture: mistral3
  • Context Length: 262,144 tokens
  • Parameter Count: 8,918,030,336
  • Quantization Precision: 4-bit
  • File Size: 5.27 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 4bd03de58774150e6d19e9a1f0e4f1e010784ac7b98801c35144f5fd796c81c8
Mistral Ministral 3 14Bministral3:14b

Description: Flagship Ministral 3 instruct model delivering frontier-level multilingual reasoning, math, and code performance in a 256K-token context, with design tuned for efficient edge and single-GPU deployment.

Specifications:

  • Capabilities: Text Generation Chat Math Vision Tools Call
  • Architecture: mistral3
  • Context Length: 262,144 tokens
  • Parameter Count: 13,945,036,800
  • Quantization Precision: 4-bit
  • File Size: 8.11 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 032759d8b1814347ae571a2dd6d24c2d36760141f07756a6559bb77a17a9e821
Mistral Nemo Instruct 2407 12.2Bmistral-nemo

Description: An instruct-tuned variant developed in collaboration with NVIDIA, balancing model size with performance for conversational tasks.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: llama
  • Context Length: 1,024,000 tokens
  • Parameter Count: 12,247,782,400
  • Quantization Precision: 4-bit
  • File Size: 6.96 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 579ab8f5178f5900d0c4e14534929aa0dba97e3f97be76b31ebe537ffd6cf169

⚠️ Replaced by: ministral3:8b

Mistral Small Instruct 2501 24Bmistral-small

Description: Optimized for local deployment, this model balances parameter count and performance for chat and code tasks.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 32,768 tokens
  • Parameter Count: 23,572,403,200
  • Quantization Precision: 4-bit
  • File Size: 13.35 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 4395b5c6136e29e9b11bdba2ee189302ad45dd5c3ef45073b729f077b8f0cec8

⚠️ Replaced by: mistral-small3.2

Mistral Small 3.1 Instruct 2503 24Bmistral-small3.1

Description: Mistral Small 3.1 (24B) enhances Mistral Small 3 with advanced vision, 128k context, multilingual support, agentic features, and efficient local deployment.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 131,072 tokens
  • Parameter Count: 23,572,403,200
  • Quantization Precision: 4-bit
  • File Size: 13.35 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 68922ff3a311c81bc4e983f86e665a12213ee84710c210522f10e65ce980bda7

⚠️ Replaced by: mistral-small3.2

Mistral Small 3.2 Instruct 2503 24Bmistral-small3.2

Description: Mistral Small 3.2 (24B) enhances Mistral Small 3 with advanced vision, 128k context, multilingual support, agentic features, and efficient local deployment.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Tools Call
  • Architecture: llama
  • Context Length: 131,072 tokens
  • Parameter Count: 23,572,403,200
  • Quantization Precision: 4-bit
  • File Size: 13.35 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: e1e9a516a90387ec98bb9c45c37dbb1478008d1fa46b216cca893cf008d92c29
Nomic embed text v1.5nomic-embed-text

Description: Provides flexible production embeddings using Matryoshka Representation Learning.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: nomic-bert
  • Context Length: 2,048 tokens
  • Parameter Count: 136,731,648
  • Quantization Precision: 4-bit
  • File Size: 85.86 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 1a60949a331b30bb754ad60b7bdff80d8e563a56b3f7f3f1aed68db8c143003e
Nomic embed vision v1.5nomic-embed-vision

Description: ViT-B/16-based image embedding model trained on 1.5B image-text pairs using Matryoshka Representation Learning. Outputs 768-dim embeddings aligned with Nomic Embed Text v1.5 for multimodal search, retrieval, and zero-shot classification.

Specifications:

  • Capabilities: Image Embeddings
  • Architecture: ViT-B/16
  • Context Length: 197 tokens
  • Parameter Count: 92,384,769
  • Quantization Precision: 8-bit
  • File Size: 92.26 MB
  • Format: ONNX
  • License: apache-2.0
  • SHA256: 4f6f6a765625a4b74ec3e62141b7b83e1db1fb904afeda1fa00c1fefefbcc714
Microsoft Phi 3.5 Mini Instruct 3.8Bphi3.5

Description: A lightweight model optimized for reasoning-dense tasks and extended context support. Designed for efficient instruction following.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: phi3
  • Context Length: 131,072 tokens
  • Parameter Count: 3,821,079,648
  • Quantization Precision: 4-bit
  • File Size: 2.23 GB
  • Format: GGUF
  • License: mit
  • SHA256: 782c34ae79564d1d92bd44dec233182559b3ecf6fedee44417e2a28c89bd0721

⚠️ Replaced by: phi4-mini

Microsoft Phi 4 Instruct 14.7Bphi4

Description: An enhanced generative model trained on a diverse dataset to improve instruction adherence and reasoning capabilities.

Specifications:

  • Capabilities: Text Generation Chat Math Tools Call
  • Architecture: phi3
  • Context Length: 16,384 tokens
  • Parameter Count: 14,659,507,200
  • Quantization Precision: 4-bit
  • File Size: 8.43 GB
  • Format: GGUF
  • License: mit
  • SHA256: 03af8f5c5a87d526047f5c20c99e32bbafd5db6dbfdee8d498d0fe1a3c45af55
Microsoft Phi 4 Mini Instruct 3.8Bphi4-mini

Description: A lightweight open model from the Phi-4 family that uses synthetic and curated public data for reasoning-dense outputs, supports a 128K token context, and is enhanced through fine-tuning and preference optimization for precise instruction adherence and robust safety.

Specifications:

  • Capabilities: Text Generation Chat Tools Call
  • Architecture: phi3
  • Context Length: 131,072 tokens
  • Parameter Count: 3,836,021,856
  • Quantization Precision: 4-bit
  • File Size: 2.32 GB
  • Format: GGUF
  • License: mit
  • SHA256: 556492e72efc8d33406b236830ad38d25669482ea7ad91fc643de237e942b9f9
Mistral Pixtral 12Bpixtral

Description: Pixtral 12B is a natively multimodal model combining a 12 B parameter decoder with a 400 M vision encoder, trained on interleaved image–text data for variable image sizes, offering state-of-the-art performance in its weight class across multimodal and text-only benchmarks and supporting ultra-long 128 k sequence lengths.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: llama
  • Context Length: 1,024,000 tokens
  • Parameter Count: 12,682,744,832
  • Quantization Precision: 4-bit
  • File Size: 7.39 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 28d42e60b5f33765ac6f3882abc4c7fd9f5a7955910ff117c13dbfc5aa6bf159
Alibaba Qwen 2 Vision Instruct 2.2Bqwen2-vl:2b

Description: A multilingual vision-language model featuring dynamic resolution processing for advanced image and long-video understanding.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: qwen2vl
  • Context Length: 32,768 tokens
  • Parameter Count: 2,208,985,700
  • Quantization Precision: 4-bit
  • File Size: 1.27 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: b4e546acfd2271f5a0960b64445cae1091e5fc4192d74db72ae57c28729bd0b8

⚠️ Replaced by: qwen3-vl:2b

Alibaba Qwen 2 Vision Instruct 8.3Bqwen2-vl:8b

Description: An extended variant in the Qwen 2 Vision family for multilingual vision-language tasks, including advanced video analysis.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: qwen2vl
  • Context Length: 32,768 tokens
  • Parameter Count: 8,291,375,716
  • Quantization Precision: 4-bit
  • File Size: 4.72 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 90b3eb60611559ba7521590ecccdf1d2a4dfab007566221c6a42f19b91b48686

⚠️ Replaced by: qwen3-vl:8b

Alibaba Qwen 2.5 Instruct 0.5Bqwen2.5:0.5b

Description: A compact variant from the Alibaba Qwen 2.5 family, optimized for instruction following across chat, embeddings, and text generation tasks.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: qwen2
  • Context Length: 32,768 tokens
  • Parameter Count: 494,032,768
  • Quantization Precision: 4-bit
  • File Size: 379.38 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 09b44ff0ef0a160ffe50778c0828754201bb3a40522a941839c23acfbc9ceec0

⚠️ Replaced by: qwen3:0.6b

Alibaba Qwen 2.5 Instruct 3.1Bqwen2.5:3b

Description: A mid-sized model from the Alibaba Qwen 2.5 series, designed for diverse tasks including chat, embeddings, and text generation. Performance should be evaluated relative to current benchmarks.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: qwen2
  • Context Length: 32,768 tokens
  • Parameter Count: 3,085,938,688
  • Quantization Precision: 4-bit
  • File Size: 1.80 GB
  • Format: GGUF
  • License: qwen-research
  • SHA256: fb88cca2303e7f7d4d52679d633efe66d9c3e3555573b4444abe5ab8af4a97f7

⚠️ Replaced by: qwen3:4b

Alibaba Qwen 2.5 Instruct 7.6Bqwen2.5:7b

Description: A larger variant from the Alibaba Qwen 2.5 series that supports extended context and multiple tasks including chat, embeddings, and text generation.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: qwen2
  • Context Length: 32,768 tokens
  • Parameter Count: 7,615,616,512
  • Quantization Precision: 4-bit
  • File Size: 4.36 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 2bf11b8a7d566bddfcc2b222ed7b918afc51239c5f919532de8b9403981ad866

⚠️ Replaced by: qwen3:8b

Alibaba Qwen 2.5 Vision Instruct 3Bqwen2.5-vl:3b

Description: Qwen2.5 VL 3B Instruct is a compact vision-language chat model that delivers advanced object and text/chart understanding, agentic tool-driven interactions, long-video event localization, precise visual grounding with JSON outputs, and structured data extraction, powered by an optimized ViT encoder with dynamic temporal training.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: qwen2vl
  • Context Length: 128,000 tokens
  • Parameter Count: 3,754,622,976
  • Quantization Precision: 4-bit
  • File Size: 2.58 GB
  • Format: LMK
  • License: qwen research license
  • SHA256: 78fee4fde9f7fd93e1365cae46668184a259b1bd2a3169915a4a1e7495f859f8

⚠️ Replaced by: qwen3-vl:4b

Alibaba Qwen 2.5 Vision Instruct 7Bqwen2.5-vl:7b

Description: Qwen2.5 VL 7B Instruct is a next-generation vision-language chat model that combines advanced object and text/chart understanding, agentic tool use, long-video event localization, precise visual grounding with JSON outputs, and structured data extraction, all powered by a streamlined ViT encoder with dynamic temporal training.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: qwen2vl
  • Context Length: 128,000 tokens
  • Parameter Count: 8,292,166,656
  • Quantization Precision: 4-bit
  • File Size: 5.16 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: e9a99c7bb06c23bd60594cebf8a881af13f502742df3047eaa3b466c747f7453

⚠️ Replaced by: qwen3-vl:8b

Alibaba Qwen 2.5 Vision Instruct 32Bqwen2.5-vl:32b

Description: Qwen2.5 VL 32B Instruct is a next-generation vision-language chat model that combines advanced object and text/chart understanding, agentic tool use, long-video event localization, precise visual grounding with JSON outputs, and structured data extraction, all powered by a streamlined ViT encoder with dynamic temporal training.

Specifications:

  • Capabilities: Text Generation Chat Vision
  • Architecture: qwen2vl
  • Context Length: 128,000 tokens
  • Parameter Count: 33,452,718,336
  • Quantization Precision: 4-bit
  • File Size: 5.16 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 9081e05fc2832177162b9ea8ccde1e0fdb1d8ed429a838527af36de966e2fb92
Alibaba Qwen 3 Instruct 0.6Bqwen3:0.6b

Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: qwen3
  • Context Length: 40,960 tokens
  • Parameter Count: 751,632,384
  • Quantization Precision: 4-bit
  • File Size: 461.79 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 2b1a7ed56061ad1275847412f61e8e009ada37ef865dccc25747dcc76eea9811
Alibaba Qwen 3 Instruct 1.7Bqwen3:1.7b

Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.

Specifications:

  • Capabilities: Text Generation Chat
  • Architecture: qwen3
  • Context Length: 40,960 tokens
  • Parameter Count: 2,031,739,904
  • Quantization Precision: 4-bit
  • File Size: 1.19 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: b047d6617eba56dcfa3357566b06807f54b15816faf6182aabd12d7e2378e537
Alibaba Qwen 3 Instruct 4Bqwen3:4b

Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.

Specifications:

  • Capabilities: Text Generation Chat Math Reasoning Tools Call
  • Architecture: qwen3
  • Context Length: 40,960 tokens
  • Parameter Count: 4,022,468,096
  • Quantization Precision: 4-bit
  • File Size: 2.33 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 9dbc1e801f001ea316a627bb867fdd192fc3b36046fd69e160155ddc12129dbe
Alibaba Qwen 3 Instruct 8Bqwen3:8b

Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.

Specifications:

  • Capabilities: Text Generation Chat Math Reasoning Tools Call
  • Architecture: qwen3
  • Context Length: 40,960 tokens
  • Parameter Count: 8,190,735,360
  • Quantization Precision: 4-bit
  • File Size: 4.68 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: b9059e3978453f50a8e9e45a825243abdb8739b2f4623e541fd5a392d9672c0f
Alibaba Qwen 3 Instruct 14Bqwen3:14b

Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.

Specifications:

  • Capabilities: Text Generation Chat Math Reasoning Tools Call
  • Architecture: qwen3
  • Context Length: 40,960 tokens
  • Parameter Count: 14,768,307,200
  • Quantization Precision: 4-bit
  • File Size: 8.38 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 520369028ee99e4a3ca413a35126337038a8da561927f81322c1b34aed10e03d
Alibaba Qwen 3 Embedding 0.6Bqwen3-embedding:0.6b

Description: Lightweight member of the Qwen3 Embedding series, optimized for fast, low-resource semantic search and ranking while preserving strong multilingual and long-context understanding.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: qwen3
  • Context Length: 32,768 tokens
  • Parameter Count: 595,776,512
  • Quantization Precision: 4-bit
  • File Size: 609.54 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: b624c62027986bc4181eadcad0cee479916c498d1039f7063195fd4c14803023
Alibaba Qwen 3 Embedding 4Bqwen3-embedding:4b

Description: Mid-size Qwen3 Embedding model offering a strong accuracy–efficiency trade-off for multilingual retrieval, reranking, classification, clustering, and bitext mining with instruction-aware embeddings.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: qwen3
  • Context Length: 40,960 tokens
  • Parameter Count: 4,021,774,336
  • Quantization Precision: 4-bit
  • File Size: 2.33 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: ac48f080498db5874835a0c6db52aa8a726f8f88fca1dbbb26fc51d5311acb85
Alibaba Qwen 3 Embedding 8Bqwen3-embedding:8b

Description: Flagship Qwen3 Embedding model delivering state-of-the-art multilingual and cross-lingual embeddings for dense retrieval, reranking, text/code search, clustering, and classification, with flexible output dimensions.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: qwen3
  • Context Length: 40,960 tokens
  • Parameter Count: 7,567,295,488
  • Quantization Precision: 4-bit
  • File Size: 4.36 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 3822cc88f2f3e9a08c4b9bae87261a7e94c503fe0372ad9b6c5b80161886291a
Alibaba Qwen 3 Vision Instruct 2Bqwen3-vl:2b

Description: Qwen3-VL is the latest Qwen vision-language family with stronger text understanding/generation, deeper visual reasoning, native 256K context (expandable), upgraded OCR (32 langs), long-video comprehension, and agentic GUI/tool use. The 2B Instruct edition targets edge devices for multimodal chat, grounding, and document/image understanding.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Vision Tools Call
  • Architecture: qwen3vl
  • Context Length: 262,144 tokens
  • Parameter Count: 2,127,532,032
  • Quantization Precision: 4-bit
  • File Size: 1.29 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 296c414827f80a0205371f2d407169f71457c4b9f9ed49edd1b28e8b9f697ace
Alibaba Qwen 3 Vision Instruct 4Bqwen3-vl:4b

Description: Qwen3-VL 4B Instruct balances efficiency and quality with advanced spatial perception (2D/3D grounding), timestamp-aligned video reasoning, and “visual coding” (HTML/CSS/JS from images). Suited for on-device or small-GPU multimodal assistants, retrieval, and structured understanding.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Vision Tools Call
  • Architecture: qwen3vl
  • Context Length: 262,144 tokens
  • Parameter Count: 2,127,532,032
  • Quantization Precision: 4-bit
  • File Size: 2.59 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 30c7ff027b6b148950533fbc6cf473abc288245af61cd23ca3ae135b6db9f3e8
Alibaba Qwen 3 Vision Instruct 8Bqwen3-vl:8b

Description: Qwen3-VL 8B Instruct is the highest-quality dense variant, delivering state-of-the-art multimodal reasoning, long-horizon video understanding, stronger recognition (celebrities, products, flora/fauna, etc.), and robust agent/tool interaction—ideal for high-fidelity VLM chat and STEM tasks.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Vision Tools Call
  • Architecture: qwen3vl
  • Context Length: 262,144 tokens
  • Parameter Count: 8,767,123,696
  • Quantization Precision: 4-bit
  • File Size: 5.21 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 76031476ae3f8720e01a941e655981c7b6779b5709a746879d771534a3d0ccdf
Alibaba Qwen 3 Vision Instruct 30Bqwen3-vl:30b

Description: Qwen3-VL 30B Instruct is the flagship MoE vision-language model in the Qwen3 family, delivering state-of-the-art multimodal reasoning, long-horizon video and document understanding with a native 256K context, precise spatial grounding and timestamp-aligned event localization, expanded OCR in 32 languages, and powerful visual-agent capabilities for GUI control, visual coding, and tool-augmented STEM and math workflows.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math Vision Tools Call
  • Architecture: qwen3vlmoe
  • Context Length: 262,144 tokens
  • Parameter Count: 31,070,754,032
  • Quantization Precision: 4-bit
  • File Size: 17.79 GB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 955041de8ecc63327b6a5fa408ccf03e80e62e663b88f5314aa2901c332b7478
Alibaba Qwen QwQ 32.5Bqwq

Description: QwQ is a reasoning-focused model in the Qwen series that significantly outperforms conventional instruction-tuned models on challenging tasks, with QwQ-32B demonstrating competitive performance compared to top reasoning models like DeepSeek-R1 and o1-mini.

Specifications:

  • Capabilities: Text Generation Chat Math Reasoning Tools Call
  • Architecture: qwen2
  • Context Length: 40,960 tokens
  • Parameter Count: 32,763,876,352
  • Quantization Precision: 4-bit
  • File Size: 18.49 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 6c2c72d16bbf5b0c30ac22031e0800b982b7d5c4e4d27daa62b66ee61c565d17
HuggingFace SmolLM3 3Bsmollm3:3b

Description: SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.

Specifications:

  • Capabilities: Text Generation Chat Code Completion Math
  • Architecture: smollm3
  • Context Length: 65,536 tokens
  • Parameter Count: 3,075,098,624
  • Quantization Precision: 4-bit
  • File Size: 1.78 GB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 03cc959aceff388ca737e4714a20cf4b3ef403116c9759a2a99f504dec40294e
U2-Net 44Mu2net

Description: A U-square nested U-Net for salient object detection and general image segmentation; lightweight encoder–decoder with RSU blocks.

Specifications:

  • Capabilities: Image Segmentation
  • Architecture: u2net
  • Context Length: 0 tokens
  • Parameter Count: 44,000,000
  • Quantization Precision: 32-bit
  • File Size: 167.85 MB
  • Format: LMK
  • License: apache-2.0
  • SHA256: bfc5e34225e3c8d3b5c3ffd3b128c7d7e6bb17de9bde56b3a6d0654de5e73661
OpenAI Whisper Basewhisper-base

Description: A balanced Whisper model delivering moderate resource use with reliable transcription accuracy.

Specifications:

  • Capabilities: Speech-to-Text
  • Architecture: whisper
  • Context Length: 1,500 tokens
  • Parameter Count: 72,593,920
  • Quantization Precision: 8-bit
  • File Size: 77.98 MB
  • Format: GGML
  • License: mit
  • SHA256: c577b9a86e7e048a0b7eada054f4dd79a56bbfa911fbdacf900ac5b567cbb7d9
OpenAI Whisper Large Turbo V3whisper-large-turbo3

Description: A turbo-optimized Whisper large v3 variant for faster transcription with near-v3 accuracy.

Specifications:

  • Capabilities: Speech-to-Text
  • Architecture: whisper
  • Context Length: 1,500 tokens
  • Parameter Count: 808,878,080
  • Quantization Precision: 8-bit
  • File Size: 833.69 MB
  • Format: GGML
  • License: mit
  • SHA256: 317eb69c11673c9de1e1f0d459b253999804ec71ac4c23c17ecf5fbe24e259a1
OpenAI Whisper Large V3whisper-large3

Description: The largest Whisper v3 model providing state-of-the-art transcription accuracy across varied audio.

Specifications:

  • Capabilities: Speech-to-Text
  • Architecture: whisper
  • Context Length: 1,500 tokens
  • Parameter Count: 1,543,490,560
  • Quantization Precision: 8-bit
  • File Size: 1.54 GB
  • Format: GGML
  • License: mit
  • SHA256: 37efc6b68f300ab717465685f7c3e175a66c11cf92bb3ab9912e86f4116c465e
OpenAI Whisper Mediumwhisper-medium

Description: A medium-sized Whisper model offering high-quality transcription for diverse audio scenarios.

Specifications:

  • Capabilities: Speech-to-Text
  • Architecture: whisper
  • Context Length: 1,500 tokens
  • Parameter Count: 763,857,920
  • Quantization Precision: 8-bit
  • File Size: 785.23 MB
  • Format: GGML
  • License: mit
  • SHA256: 42a1ffcbe4167d224232443396968db4d02d4e8e87e213d3ee2e03095dea6502
OpenAI Whisper Smallwhisper-small

Description: A small Whisper model providing improved transcription fidelity while remaining efficient.

Specifications:

  • Capabilities: Speech-to-Text
  • Architecture: whisper
  • Context Length: 1,500 tokens
  • Parameter Count: 241,734,912
  • Quantization Precision: 8-bit
  • File Size: 252.21 MB
  • Format: GGML
  • License: mit
  • SHA256: 49c8fb02b65e6049d5fa6c04f81f53b867b5ec9540406812c643f177317f779f
OpenAI Whisper Tinywhisper-tiny

Description: The smallest Whisper variant offering fast, lightweight speech-to-text transcription.

Specifications:

  • Capabilities: Speech-to-Text
  • Architecture: whisper
  • Context Length: 1,500 tokens
  • Parameter Count: 37,760,640
  • Quantization Precision: 8-bit
  • File Size: 41.52 MB
  • Format: GGML
  • License: mit
  • SHA256: c2085835d3f50733e2ff6e4b41ae8a2b8d8110461e18821b09a15c40c42d1cca