Loading Models
A model can be loaded by its Model ID:


var model = LM.LoadFromModelID("gemma3:4b");

LM-Kit Model Catalog

Display Legacy Models

Model & ID⇅	Capabilities	Context⇅	Params.⇅	Format⇅	License⇅	Download	Details
BAAI bge m3 `bge-m3`	Text Embeddings	8192	0.57 B	GGUF	mit	bge-m3-Q4_K_M.gguf	details
BAAI bge m3 reranker v2 `bge-m3-reranker`	Text Reranking	8192	0.57 B	GGUF	apache-2.0	Bge-M3-568M-Q4_K_M.gguf	details
BAAI bge small en v1.5 `bge-small`	Text Embeddings	512	0.03 B	GGUF	mit	bge-small-en-v1.5-f16.gguf	details
DeepSeek Coder V2 Lite `deepseek-coder-v2:16b`	Code Completion	163840	15.71 B	GGUF	deepseek	DeepSeek-Coder-2-Lite-15.7B-Instruct-Q4_K_M.gguf	details
DeepSeek R1 Distill Llama `deepseek-r1:8b`	Text Generation Chat Code Completion Math	131072	8.03 B	GGUF	mit	DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf	details
Google Gemma Embedding 300M `embeddinggemma-300m`	Text Embeddings	2058	0.03 B	GGUF	gemma	embeddinggemma-300M-Q4_K_M.gguf	details
TII Falcon 3 Instruct `falcon3:3b`	Text Generation Chat Code Completion Math	32768	3.23 B	GGUF	falcon-llm-license	Falcon3-3B-Instruct-q4_k_m.gguf	details
TII Falcon 3 Instruct `falcon3:7b`	Text Generation Chat Code Completion Math	32768	7.62 B	GGUF	falcon-llm-license	Falcon-3-7.6B-Instruct-Q4_K_M.gguf	details
TII Falcon 3 Instruct `falcon3:10b`	Text Generation Chat Code Completion Math	32768	10.31 B	GGUF	falcon-llm-license	Falcon3-10B-Instruct-q4_k_m.gguf	details
Google Gemma 2 `gemma2:2b`	Text Generation Chat	8192	2.61 B	GGUF	gemma	gemma-2-2B-Q4_K_M.gguf	Replaced by `gemma3:1b`
Google Gemma 2 `gemma2:9b`	Text Generation Chat	8192	9.24 B	GGUF	gemma	gemma-2-9B-Q4_K_M.gguf	Replaced by `gemma3:4b`
Google Gemma 2 `gemma2:27b`	Text Generation Chat	8192	27.23 B	GGUF	gemma	gemma-2-27B-Q4_K_M.gguf	Replaced by `gemma3:27b`
Google Gemma 3 `gemma3:1b`	Text Generation Chat	32768	1.00 B	GGUF	gemma	gemma-3-it-1B-Q4_K_M.gguf	details
Google Gemma 3 `gemma3:4b`	Text Generation Chat Code Completion Math Vision	131072	3.88 B	GGUF	gemma	gemma-3-4b-it-Q4_K_M.lmk	details
Google Gemma 3 `gemma3:12b`	Text Generation Chat Code Completion Math Vision	131072	11.77 B	GGUF	gemma	gemma-3-12b-it-Q4_K_M.lmk	details
Google Gemma 3 `gemma3:27b`	Text Generation Chat Code Completion Math Vision	131072	27.01 B	GGUF	gemma	gemma-3-27b-it-Q4_K_M.lmk	details
Google Gemma 3 270M `gemma3:270m`	Text Generation Chat	32768	0.27 B	GGUF	gemma	gemma-3-270M-it-Q4_K_M.gguf	details
OpenAI Gpt OSS `gptoss:20b`	Text Generation Chat Code Completion Math	131072	20.91 B	GGUF	apache-2.0	gpt-oss-20b-mxfp4.gguf	details
IBM Granite 3.1 Dense Instruct `granite3.1-dense:2b`	Text Generation Chat Code Completion	131072	2.53 B	GGUF	apache-2.0	granite-3.1-2.5B-Q4_K_M.gguf	Replaced by `granite4-h:3b`
IBM Granite 3.1 Dense Instruct `granite3.1-dense:8b`	Text Generation Chat Code Completion	131072	8.17 B	GGUF	apache-2.0	granite-3.1-8.2B-Q4_K_M.gguf	Replaced by `granite4-h:7b`
IBM Granite 3.3 Instruct `granite3.3:2b`	Text Generation Chat Code Completion	131072	2.53 B	GGUF	apache-2.0	granite-3.3-8B-Instruct-Q4_K_M.gguf	Replaced by `granite4-h:3b`
IBM Granite 3.3 Instruct `granite3.3:8b`	Text Generation Chat Code Completion	131072	8.17 B	GGUF	apache-2.0	granite-3.3-2B-Instruct-Q4_K_M.gguf	Replaced by `granite4-h:7b`
IBM Granite 4 Micro Instruct `granite4-h:3b`	Text Generation Chat Code Completion	1048576	3.19 B	GGUF	apache-2.0	Granite-4.0-H-Micro-3.2B-Q4_K_M.gguf	details
IBM Granite 4 Tiny Instruct `granite4-h:7b`	Text Generation Chat Code Completion	1048576	6.94 B	GGUF	apache-2.0	Granite-4.0-H-Tiny-64x994M-Q4_K_M.gguf	details
Meta Llama 3.1 Instruct `llama3.1`	Text Generation Chat	131072	8.03 B	GGUF	llama3.1	Llama-3.1-8B-Instruct-Q4_K_M.gguf	details
Meta Llama 3.2 Instruct `llama3.2:1b`	Text Generation Chat	131072	1.24 B	GGUF	llama3.2	Llama-3.2-1B-Instruct-Q4_K_M.gguf	details
Meta Llama 3.2 Instruct `llama3.2:3b`	Text Generation Chat	131072	3.21 B	GGUF	llama3.2	Llama-3.2-3B-Instruct-Q4_K_M.gguf	details
Meta Llama 3.3 Instruct `llama3.3`	Text Generation Chat Code Completion Math	131072	70.55 B	GGUF	llama3.3	Llama-3.3-70B-Instruct-Q4_K_M.gguf	details
LM-Kit Sarcasm Detection V1 `lmkit-sarcasm-detection`	Sentiment Analysis	2048	1.10 B	GGUF	lm-kit	LM-Kit.Sarcasm_Detection-TinyLlama-1.1B-1T-OpenOrca-en-q4.gguf	details
LM-Kit Sentiment Analysis V2 `lmkit-sentiment-analysis`	Sentiment Analysis	131072	1.24 B	GGUF	lm-kit	lm-kit-sentiment-analysis-2.0-1b-q4.gguf	details
LM-Kit Tasks Preview `lmkit-tasks:4b-preview`	Text Generation Chat Code Completion Math Vision	131072	3.88 B	LMK	lmkit	lmkit-tasks-4b-preview.lmk	details
Mistral Magistral Small 1.1 `magistral-small`	Text Generation Chat Code Completion Math	40960	23.57 B	GGUF	apache-2.0	Magistral-Small-2506-Q4_K_M.gguf	Replaced by `magistral-small1.2`
Mistral Magistral Small 1.2 `magistral-small1.2`	Text Generation Chat Code Completion Math	40960	23.57 B	GGUF	apache-2.0	Magistral-Small-2509-Q4_K_M.gguf	details
OpenBMB MiniCPM o 2.6 Vision `minicpm-o`	Text Generation Chat Vision	32768	8.12 B	LMK	OpenBMB	MiniCPM-o-V-2.6-Q4_K_M.lmk	details
OpenBMB MiniCPM 2.6 Vision `minicpm-v`	Text Generation Chat Vision	32768	8.12 B	LMK	OpenBMB	MiniCPM-V-2.6-Q4_K_M.lmk	Replaced by `minicpm-o`
OpenBMB MiniCPM-V 4.5 `minicpm-v-45`	Text Generation Chat Vision	40960	8.72 B	LMK	OpenBMB	minicpm-v-4.5-8b.lmk	details
Mistral Ministral 3 `ministral3:3b`	Text Generation Chat Code Completion Math Vision	262144	3.85 B	LMK	apache-2.0	ministral-3-3b-instruct-Q4_K_M.lmk	details
Mistral Ministral 3 `ministral3:8b`	Text Generation Chat Code Completion Math Vision	262144	8.92 B	LMK	apache-2.0	ministral-3-8b-instruct-Q4_K_M.lmk	details
Mistral Ministral 3 `ministral3:14b`	Text Generation Chat Code Completion Math Vision	262144	13.95 B	LMK	apache-2.0	ministral-3-14b-instruct-Q4_K_M.lmk	details
Mistral Nemo Instruct 2407 `mistral-nemo`	Text Generation Chat	1024000	12.25 B	GGUF	apache-2.0	Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf	Replaced by `ministral3:8b`
Mistral Small Instruct 2501 `mistral-small`	Text Generation Chat Code Completion Math	32768	23.57 B	GGUF	apache-2.0	Mistral-Small-Instruct-2501-24B-Q4_K_M.gguf	Replaced by `mistral-small3.2`
Mistral Small 3.1 Instruct 2503 `mistral-small3.1`	Text Generation Chat Code Completion Math	131072	23.57 B	GGUF	apache-2.0	Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf	Replaced by `mistral-small3.2`
Mistral Small 3.2 Instruct 2503 `mistral-small3.2`	Text Generation Chat Code Completion Math	131072	23.57 B	GGUF	apache-2.0	Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf	details
Nomic embed text v1.5 `nomic-embed-text`	Text Embeddings	2048	0.14 B	GGUF	apache-2.0	nomic-embed-text-1.5-Q4_K_M.gguf	details
Nomic embed vision v1.5 `nomic-embed-vision`	Image Embeddings	197	0.09 B	ONNX	apache-2.0	nomic-embed-vision-1.5-Q8.lmk	details
Microsoft Phi 3.5 Mini Instruct `phi3.5`	Text Generation Chat	131072	3.82 B	GGUF	mit	Phi-3.5-mini-Instruct-Q4_K_M.gguf	Replaced by `phi4-mini`
Microsoft Phi 4 Instruct `phi4`	Text Generation Chat Math	16384	14.66 B	GGUF	mit	Phi-4-14.7B-Instruct-Q4_K_M.gguf	details
Microsoft Phi 4 Mini Instruct `phi4-mini`	Text Generation Chat	131072	3.84 B	GGUF	mit	Phi-4-mini-Instruct-Q4_K_M.gguf	details
Mistral Pixtral `pixtral`	Text Generation Chat Vision	1024000	12.68 B	LMK	apache-2.0	pixtral-12B-Q4_K_M.lmk	details
Alibaba Qwen 2 Vision Instruct `qwen2-vl:2b`	Text Generation Chat Vision	32768	2.21 B	LMK	apache-2.0	Qwen2-VL-2B-Instruct-Q4_K_M.lmk	Replaced by `qwen3-vl:2b`
Alibaba Qwen 2 Vision Instruct `qwen2-vl:8b`	Text Generation Chat Vision	32768	8.29 B	LMK	apache-2.0	Qwen2-VL-8.3B-Instruct-Q4_K_M.lmk	Replaced by `qwen3-vl:8b`
Alibaba Qwen 2.5 Instruct `qwen2.5:0.5b`	Text Generation Chat	32768	0.49 B	GGUF	apache-2.0	Qwen-2.5-0.5B-Instruct-Q4_K_M.gguf	Replaced by `qwen3:0.6b`
Alibaba Qwen 2.5 Instruct `qwen2.5:3b`	Text Generation Chat	32768	3.09 B	GGUF	qwen-research	Qwen-2.5-3.1B-Instruct-Q4_K_M.gguf	Replaced by `qwen3:4b`
Alibaba Qwen 2.5 Instruct `qwen2.5:7b`	Text Generation Chat	32768	7.62 B	GGUF	apache-2.0	Qwen-2.5-7B-Instruct-Q4_K_M.gguf	Replaced by `qwen3:8b`
Alibaba Qwen 2.5 Vision Instruct `qwen2.5-vl:3b`	Text Generation Chat Vision	128000	3.75 B	LMK	qwen research license	Qwen2.5-VL-3B-Instruct-Q4_K_M.lmk	Replaced by `qwen3-vl:4b`
Alibaba Qwen 2.5 Vision Instruct `qwen2.5-vl:7b`	Text Generation Chat Vision	128000	8.29 B	LMK	apache-2.0	Qwen2.5-VL-7B-Instruct-Q4_K_M.lmk	Replaced by `qwen3-vl:8b`
Alibaba Qwen 2.5 Vision Instruct `qwen2.5-vl:32b`	Text Generation Chat Vision	128000	33.45 B	LMK	apache-2.0	Qwen2.5-VL-32B-Instruct-Q4_K_M.lmk	details
Alibaba Qwen 3 Instruct `qwen3:0.6b`	Text Generation Chat	40960	0.75 B	GGUF	apache-2.0	Qwen3-0.6B-Q4_K_M.gguf	details
Alibaba Qwen 3 Instruct `qwen3:1.7b`	Text Generation Chat	40960	2.03 B	GGUF	apache-2.0	Qwen3-1.7B-Q4_K_M.gguf	details
Alibaba Qwen 3 Instruct `qwen3:4b`	Text Generation Chat Code Completion Math	40960	4.02 B	GGUF	apache-2.0	Qwen3-4B-Q4_K_M.gguf	details
Alibaba Qwen 3 Instruct `qwen3:8b`	Text Generation Chat Code Completion Math	40960	8.19 B	GGUF	apache-2.0	Qwen3-8B-Q4_K_M.gguf	details
Alibaba Qwen 3 Instruct `qwen3:14b`	Text Generation Chat Code Completion Math	40960	14.77 B	GGUF	apache-2.0	Qwen3-14B-Q4_K_M.gguf	details
Alibaba Qwen 3 Embedding `qwen3-embedding:0.6b`	Text Embeddings	32768	0.60 B	GGUF	apache-2.0	Qwen3-Embedding-0.6B-Q4_K_M.gguf	details
Alibaba Qwen 3 Embedding `qwen3-embedding:4b`	Text Embeddings	40960	4.02 B	GGUF	apache-2.0	Qwen3-Embedding-4B-Q4_K_M.gguf	details
Alibaba Qwen 3 Embedding `qwen3-embedding:8b`	Text Embeddings	40960	7.57 B	GGUF	apache-2.0	Qwen3-Embedding-8B-Q4_K_M.gguf	details
Alibaba Qwen 3 Vision Instruct `qwen3-vl:2b`	Text Generation Chat Code Completion Math Vision	262144	2.13 B	LMK	apache-2.0	qwen3-vl-2b-instruct-Q4_K_M.lmk	details
Alibaba Qwen 3 Vision Instruct `qwen3-vl:4b`	Text Generation Chat Code Completion Math Vision	262144	2.13 B	LMK	apache-2.0	qwen3-vl-4b-instruct-Q4_K_M.lmk	details
Alibaba Qwen 3 Vision Instruct `qwen3-vl:8b`	Text Generation Chat Code Completion Math Vision	262144	8.77 B	LMK	apache-2.0	qwen3-vl-8b-instruct-Q4_K_M.lmk	details
Alibaba Qwen QwQ `qwq`	Text Generation Chat Code Completion Math	40960	32.76 B	GGUF	apache-2.0	QwQ-32B-Q4_K_M.gguf	details
HuggingFace SmolLM3 `smollm3:3b`	Text Generation Chat Code Completion Math	65536	3.08 B	GGUF	apache-2.0	SmolLM3-3B-Q4_K_M.gguf	details
U2-Net 44M `u2net`	Image Segmentation	0	0.04 B	LMK	apache-2.0	u2-net-F32.lmk	details
OpenAI Whisper Base `whisper-base`	Speech-to-Text	1500	0.07 B	GGML	mit	whisper-base-q8_0.bin	details
OpenAI Whisper Large Turbo V3 `whisper-large-turbo3`	Speech-to-Text	1500	0.81 B	GGML	mit	whisper-large-v3-turbo-q8_0.bin	details
OpenAI Whisper Large V3 `whisper-large3`	Speech-to-Text	1500	1.54 B	GGML	mit	whisper-large-v3-q8_0.bin	details
OpenAI Whisper Medium `whisper-medium`	Speech-to-Text	1500	0.76 B	GGML	mit	whisper-medium-q8_0.bin	details
OpenAI Whisper Small `whisper-small`	Speech-to-Text	1500	0.24 B	GGML	mit	whisper-small-q8_0.bin	details
OpenAI Whisper Tiny `whisper-tiny`	Speech-to-Text	1500	0.04 B	GGML	mit	whisper-tiny-q8_0.bin	details

Model Details

BAAI bge m3 (bge-m3)

Description: A unified, multilingual embedding model that delivers dense, sparse, and multi-vector retrieval on texts from short queries up to 8,192-token documents in over 100 languages.

Specifications:

Capabilities: Text Embeddings
Architecture: bert
Context Length: 8192 tokens
Parameter Count: 566,703,104
Quantization Precision: 4-bit
File Size: 417.50 MB
Format: GGUF
License: mit
SHA256: e251234fcb7d050991a6be491952f485bf5c641dd10c3272dc1301fd281ad50f

Download: bge-m3-Q4_K_M.gguf

BAAI bge m3 reranker v2 (bge-m3-reranker)

Description: A unified, multilingual reranker that ingests query–document pairs and directly produces sigmoid-normalized relevance scores across over 100 languages.

Specifications:

Capabilities: Text Reranking
Architecture: bert
Context Length: 8192 tokens
Parameter Count: 567,753,729
Quantization Precision: 4-bit
File Size: 418.07 MB
Format: GGUF
License: apache-2.0
SHA256: ce947cece730cbf7d836da8c5490a9987ef0f919014b9275e7ce9aa12d96e6d9

Download: Bge-M3-568M-Q4_K_M.gguf

BAAI bge small en v1.5 (bge-small)

Description: An efficient, CPU-friendly English embedding model (BAAI General Embedding) designed for lightweight applications.

Specifications:

Capabilities: Text Embeddings
Architecture: bert
Context Length: 512 tokens
Parameter Count: 33,212,160
Quantization Precision: 16-bit
File Size: 64.45 MB
Format: GGUF
License: mit
SHA256: cd5790da23df71e7e20fe20bb523bd4586a533a4ee813cc562e32b37929141c1

Download: bge-small-en-v1.5-f16.gguf

DeepSeek Coder V2 Lite 15.7B (deepseek-coder-v2:16b)

Description: An open-source mixture-of-experts code model tailored for code completion tasks. Early evaluations indicated competitive performance relative to leading code models.

Specifications:

Capabilities: Code Completion
Architecture: deepseek2
Context Length: 163840 tokens
Parameter Count: 15,706,484,224
Quantization Precision: 4-bit
File Size: 9.65 GB
Format: GGUF
License: deepseek
SHA256: ac398e8c1c670d3c362d3c1182614916bab7c364708ec073fcf947f6802d509e

Download: DeepSeek-Coder-2-Lite-15.7B-Instruct-Q4_K_M.gguf

DeepSeek R1 Distill Llama 8B (deepseek-r1:8b)

Description: DeepSeek-R1 enhances its predecessor by integrating cold-start data to overcome repetition and readability issues, achieving state-of-the-art performance in math, code, and reasoning tasks, with all models open-sourced.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 131072 tokens
Parameter Count: 8,030,261,312
Quantization Precision: 4-bit
File Size: 4.58 GB
Format: GGUF
License: mit
SHA256: 596fce705423e44831fe63367a30ccc7b36921c1bfdd4b9dfde85a5aa97ac2ef

Download: DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf

Google Gemma Embedding 300M (embeddinggemma-300m)

Description: EmbeddingGemma 300M is an open, state-of-the-art-for-its-size embedding model from Google DeepMind (Gemma 3, T5Gemma-initialized). It produces 768-dimensional text vectors for search/retrieval, classification, clustering, and semantic similarity across 100+ languages, and supports Matryoshka Representation Learning (truncate to 512/256/128 with re-normalization). Optimized for on-device/CPU deployment.

Specifications:

Capabilities: Text Embeddings
Architecture: gemma-embedding
Context Length: 2058 tokens
Parameter Count: 33,212,160
Quantization Precision: 4-bit
File Size: 288.83 MB
Format: GGUF
License: gemma
SHA256: 3d55e7fe66eb4c7b2d01b4fbd30c00dc7a101bd6c9f724a6e7e5cfaa87968420

Download: embeddinggemma-300M-Q4_K_M.gguf

TII Falcon 3 Instruct 3.2B (falcon3:3b)

Description: Designed for multilingual tasks including chat, text generation, and code completion, supporting extended context lengths.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 32768 tokens
Parameter Count: 3,227,655,168
Quantization Precision: 4-bit
File Size: 1.87 GB
Format: GGUF
License: falcon-llm-license
SHA256: 81c6b52d221c2f0eea3db172fc74de28534f2fd15f198ecbfcc55577d20cbf8a

Download: Falcon3-3B-Instruct-q4_k_m.gguf

TII Falcon 3 Instruct 7.6B (falcon3:7b)

Description: Offers robust performance across chat, text generation, and mathematical reasoning tasks with extended context support.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 32768 tokens
Parameter Count: 7,615,616,512
Quantization Precision: 4-bit
File Size: 4.26 GB
Format: GGUF
License: falcon-llm-license
SHA256: 4ce1da546d76e04ce77eb076556eb25e1096faf6155ee429245e4bfa3f5ddf5d

Download: Falcon-3-7.6B-Instruct-Q4_K_M.gguf

TII Falcon 3 Instruct 10.3B (falcon3:10b)

Description: A larger variant tailored for multilingual dialogue, code completion, and complex reasoning tasks with extended context support.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 32768 tokens
Parameter Count: 10,305,653,760
Quantization Precision: 4-bit
File Size: 5.86 GB
Format: GGUF
License: falcon-llm-license
SHA256: a0c0edbd35019ff26d972a0373b25b4c8d72315395a3b6036aca5e6bafa3d819

Download: Falcon3-10B-Instruct-q4_k_m.gguf

Google Gemma 2 2.6B (gemma2:2b)

Description: A lightweight decoder-only model from Google, available in both pre-trained and instruction-tuned variants for text-to-text tasks.

Specifications:

Capabilities: Text Generation Chat
Architecture: gemma2
Context Length: 8192 tokens
Parameter Count: 2,614,341,888
Quantization Precision: 4-bit
File Size: 1.59 GB
Format: GGUF
License: gemma
SHA256: 362d09c1496e035ecf0737d8fe03e8e607c61e57e16b22cedd158525f6721e06

Replaced by: gemma3:1b

Download: gemma-2-2B-Q4_K_M.gguf

Google Gemma 2 9.2B (gemma2:9b)

Description: A decoder-only text-to-text model from Google, offering competitive performance in both pre-trained and instruction-tuned configurations.

Specifications:

Capabilities: Text Generation Chat
Architecture: gemma2
Context Length: 8192 tokens
Parameter Count: 9,241,705,984
Quantization Precision: 4-bit
File Size: 5.37 GB
Format: GGUF
License: gemma
SHA256: b6059a960d2f4f881630f1e795b40f7e09e5e12d3a6b1900474d6108ea880afd

Replaced by: gemma3:4b

Download: gemma-2-9B-Q4_K_M.gguf

Google Gemma 2 27.2B (gemma2:27b)

Description: A larger variant in the Gemma 2 family, optimized for text generation and instruction following with open weights provided.

Specifications:

Capabilities: Text Generation Chat
Architecture: gemma2
Context Length: 8192 tokens
Parameter Count: 27,227,128,320
Quantization Precision: 4-bit
File Size: 15.50 GB
Format: GGUF
License: gemma
SHA256: bb4b276745da743d550720dc2e6c847498eef45e7b82a4d5a73ef6636f78027a

Replaced by: gemma3:27b

Download: gemma-2-27B-Q4_K_M.gguf

Google Gemma 3 1B (gemma3:1b)

Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

Capabilities: Text Generation Chat
Architecture: gemma3
Context Length: 32768 tokens
Parameter Count: 999,885,952
Quantization Precision: 4-bit
File Size: 768.72 MB
Format: GGUF
License: gemma
SHA256: bacfe3de6eee9fba412d5c0415630172c2a602dae26bb353e1b20dd67194a226

Download: gemma-3-it-1B-Q4_K_M.gguf

Google Gemma 3 3.9B (gemma3:4b)

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: gemma3
Context Length: 131072 tokens
Parameter Count: 3,880,099,328
Quantization Precision: 4-bit
File Size: 2.87 GB
Format: GGUF
License: gemma
SHA256: abb283e96c0abf58468a18127ce6e8b2bfc98e48f1ec618f658495c09254bdae

Download: gemma-3-4b-it-Q4_K_M.lmk

Google Gemma 3 11.8B (gemma3:12b)

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: gemma3
Context Length: 131072 tokens
Parameter Count: 11,765,788,416
Quantization Precision: 4-bit
File Size: 7.35 GB
Format: GGUF
License: gemma
SHA256: d6f01cdb4369769ea87c5211a7fd865e12dbb9e2a937b43ef281a5b7e9ba2e35

Download: gemma-3-12b-it-Q4_K_M.lmk

Google Gemma 3 27.2B (gemma3:27b)

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: gemma3
Context Length: 131072 tokens
Parameter Count: 27,009,002,240
Quantization Precision: 4-bit
File Size: 15.97 GB
Format: GGUF
License: gemma
SHA256: 2d0e4382259ae2da28b9c0342e982a58eafbddad7c05bbfe6e104f2b3c165994

Download: gemma-3-27b-it-Q4_K_M.lmk

Google Gemma 3 270M (gemma3:270m)

Specifications:

Capabilities: Text Generation Chat
Architecture: gemma3
Context Length: 32768 tokens
Parameter Count: 268,098,176
Quantization Precision: 4-bit
File Size: 241.39 MB
Format: GGUF
License: gemma
SHA256: e28b323bc75925d6edc8d3f030268830bf53c59c296d77278ac24653403d9d47

Download: gemma-3-270M-it-Q4_K_M.gguf

OpenAI Gpt OSS (gptoss:20b)

Description: OpenAI’s medium-sized open-weight Mixture-of-Experts model (≈21B params; ~3.6B active per token). This MXFP4 GGUF build is optimized for local inference, supports long context (131k), strong reasoning & tool use, and can run on consumer GPUs (~16GB VRAM).

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: gpt-oss
Context Length: 131072 tokens
Parameter Count: 20,914,757,184
Quantization Precision: 4-bit
File Size: 11.28 GB
Format: GGUF
License: apache-2.0
SHA256: 52f57ab7d3df3ba9173827c1c6832e73375553a846f3e32b49f1ae2daad688d4

Download: gpt-oss-20b-mxfp4.gguf

IBM Granite 3.1 Dense Instruct 2.5B (granite3.1-dense:2b)

Description: A long-context instruct model finetuned with a mix of open source and synthetic datasets. Designed for dialogue and text generation tasks.

Specifications:

Capabilities: Text Generation Chat Code Completion
Architecture: granite
Context Length: 131072 tokens
Parameter Count: 2,533,531,648
Quantization Precision: 4-bit
File Size: 1.44 GB
Format: GGUF
License: apache-2.0
SHA256: ba05b36d0a8cebf8ccd13bbbb904bebe182f4854fbcff19cd1ee54bc82bbd298

Replaced by: granite4-h:3b

Download: granite-3.1-2.5B-Q4_K_M.gguf

IBM Granite 3.1 Dense Instruct 8.2B (granite3.1-dense:8b)

Description: An extended-context model optimized for dialogue and code completion tasks. Developed with diverse training data to enhance long-context understanding.

Specifications:

Capabilities: Text Generation Chat Code Completion
Architecture: granite
Context Length: 131072 tokens
Parameter Count: 8,170,848,256
Quantization Precision: 4-bit
File Size: 4.60 GB
Format: GGUF
License: apache-2.0
SHA256: d1ada98d7b274fc6b119bd19b8d3536cd006544e9aae06db6f8b2c256d584044

Replaced by: granite4-h:7b

Download: granite-3.1-8.2B-Q4_K_M.gguf

IBM Granite 3.3 Instruct 2.5B (granite3.3:2b)

Description: A long-context instruct model finetuned with a mix of open source and synthetic datasets. Designed for dialogue and text generation tasks.

Specifications:

Capabilities: Text Generation Chat Code Completion
Architecture: granite
Context Length: 131072 tokens
Parameter Count: 2,533,539,840
Quantization Precision: 4-bit
File Size: 1.44 GB
Format: GGUF
License: apache-2.0
SHA256: dbe4dd51bd6c1e39f96c831bf086454c9b313bd1c279ebb7166f2a37d86598da

Replaced by: granite4-h:3b

Download: granite-3.3-8B-Instruct-Q4_K_M.gguf

IBM Granite 3.3 Instruct 8.2B (granite3.3:8b)

Description: An extended-context model optimized for dialogue and code completion tasks. Developed with diverse training data to enhance long-context understanding.

Specifications:

Capabilities: Text Generation Chat Code Completion
Architecture: granite
Context Length: 131072 tokens
Parameter Count: 8,170,864,640
Quantization Precision: 4-bit
File Size: 4.60 GB
Format: GGUF
License: apache-2.0
SHA256: 1c890e740d7ecb010716a858eda315c01ac5bb0edfaf68bf17118868a26bb8ff

Replaced by: granite4-h:7b

Download: granite-3.3-2B-Instruct-Q4_K_M.gguf

IBM Granite 4 Micro Instruct 3.2B (granite4-h:3b)

Description: Hybrid long-context instruct model (Mamba-2 + attention) finetuned from Granite-4.0-H-Micro-Base with SFT, RL alignment, and model merging. Delivers stronger instruction following and robust tool/function calling in multilingual dialog (en, de, es, fr, ja, pt, ar, cs, it, ko, nl, zh), with 1M-token context for enterprise assistants. Excels at summarization, classification, extraction, QA/RAG, and code—including FIM—and supports structured chat templates and OpenAI-style tool schemas.

Specifications:

Capabilities: Text Generation Chat Code Completion
Architecture: granitehybrid
Context Length: 1048576 tokens
Parameter Count: 3,191,396,096
Quantization Precision: 4-bit
File Size: 1.81 GB
Format: GGUF
License: apache-2.0
SHA256: dbe7b747aa49340f80629811652636b55f4ca4cbbb92ee7e17c442d8a1130566

Download: Granite-4.0-H-Micro-3.2B-Q4_K_M.gguf

IBM Granite 4 Tiny Instruct 6.9B (granite4-h:7b)

Description: Granite-4.0-H-Tiny is an ~7B-parameter hybrid (attention + Mamba2) MoE decoder with a 1M-token context, instruction-tuned (SFT, RL alignment, model merging) for enterprise assistants. It improves instruction following and tool/function calling, supports multilingual dialog (en, de, es, fr, ja, pt, ar, cs, it, ko, nl, zh), and excels at summarization, classification, extraction, QA/RAG, and code/FIM tasks.

Specifications:

Capabilities: Text Generation Chat Code Completion
Architecture: granitehybrid
Context Length: 1048576 tokens
Parameter Count: 6,939,037,248
Quantization Precision: 4-bit
File Size: 3.94 GB
Format: GGUF
License: apache-2.0
SHA256: 75234a50a38235dd4c891dae1a702ccf47a5d89da751d38f90a43be4794f18fb

Download: Granite-4.0-H-Tiny-64x994M-Q4_K_M.gguf

Meta Llama 3.1 Instruct 8B (llama3.1)

Description: A multilingual generative model optimized for dialogue and text generation tasks. Designed for robust performance on common benchmarks.

Specifications:

Capabilities: Text Generation Chat
Architecture: llama
Context Length: 131072 tokens
Parameter Count: 8,030,261,312
Quantization Precision: 4-bit
File Size: 4.58 GB
Format: GGUF
License: llama3.1
SHA256: ad00fe50a62d1e009b4e06cd57ab55c9a30cbf5e7f183de09115d75ada73bd5b

Download: Llama-3.1-8B-Instruct-Q4_K_M.gguf

Meta Llama 3.2 Instruct 1.2B (llama3.2:1b)

Description: A multilingual instruct-tuned model optimized for dialogue, retrieval, and summarization tasks.

Specifications:

Capabilities: Text Generation Chat
Architecture: llama
Context Length: 131072 tokens
Parameter Count: 1,235,814,432
Quantization Precision: 4-bit
File Size: 770.28 MB
Format: GGUF
License: llama3.2
SHA256: 88725e821cf35f1a0dbeaa4a3bebeb91e6c6b6a9d50f808ab42d64233284cce1

Download: Llama-3.2-1B-Instruct-Q4_K_M.gguf

Meta Llama 3.2 Instruct 3.2B (llama3.2:3b)

Description: A multilingual dialogue model with robust text generation and summarization capabilities.

Specifications:

Capabilities: Text Generation Chat
Architecture: llama
Context Length: 131072 tokens
Parameter Count: 3,212,749,888
Quantization Precision: 4-bit
File Size: 1.88 GB
Format: GGUF
License: llama3.2
SHA256: 6810bf3cce69d440a22b85a3b3e28f57c868f1c98686abd995f1dc5d9b955cfe

Download: Llama-3.2-3B-Instruct-Q4_K_M.gguf

Meta Llama 3.3 Instruct 70.6B (llama3.3)

Description: A large multilingual generative model optimized for dialogue, text tasks, code completion, and mathematical reasoning with extended context support.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 131072 tokens
Parameter Count: 70,553,706,560
Quantization Precision: 4-bit
File Size: 39.60 GB
Format: GGUF
License: llama3.3
SHA256: 57f78fe3b141afa56406278265656524c51c9837edb3537ad43708b6d4ecc04d

Download: Llama-3.3-70B-Instruct-Q4_K_M.gguf

LM-Kit Sarcasm Detection V1 1.1B (lmkit-sarcasm-detection)

Description: Optimized for detecting sarcasm in English text within the LM-Kit framework. Suitable for CPU-based inference.

Specifications:

Capabilities: Sentiment Analysis
Architecture: llama
Context Length: 2048 tokens
Parameter Count: 1,100,048,384
Quantization Precision: 4-bit
File Size: 636.88 MB
Format: GGUF
License: lm-kit
SHA256: cc82abd224dba9c689b19d368db6078d6167ca84897b21870d7d6a2c0f09d7d0

Download: LM-Kit.Sarcasm_Detection-TinyLlama-1.1B-1T-OpenOrca-en-q4.gguf

LM-Kit Sentiment Analysis V2 1.2B (lmkit-sentiment-analysis)

Description: Designed for multilingual sentiment analysis tasks, this LM-Kit model is optimized for efficient CPU-based inference.

Specifications:

Capabilities: Sentiment Analysis
Architecture: llama
Context Length: 131072 tokens
Parameter Count: 1,235,814,432
Quantization Precision: 4-bit
File Size: 770.28 MB
Format: GGUF
License: lm-kit
SHA256: e12f4abf6453a8431985ce1d6350c265cd58b25210156a917e3608c850fd7add

Download: lm-kit-sentiment-analysis-2.0-1b-q4.gguf

LM-Kit Tasks 4B Preview (lmkit-tasks:4b-preview)

Description: A 4B-parameter Gemma3-based model optimized for LM-Kit tasks. Achieves state-of-the-art performance in classification, structured data extraction, language detection, and sentiment analysis, while also supporting chat, embeddings, text generation, code completion, math reasoning, and vision understanding. Designed for seamless integration into LM-Kit pipelines to deliver efficient, reliable, and high-quality results across domains.

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: gemma3
Context Length: 131072 tokens
Parameter Count: 3,880,099,328
Quantization Precision: 4-bit
File Size: 3.09 GB
Format: LMK
License: lmkit
SHA256: 3ec9fe4622e2d9a050b3d2c7d2244a911aab75372b04a7bc30bb72a05bdd645c

Download: lmkit-tasks-4b-preview.lmk

Mistral Magistral Small 1.1 24B (magistral-small)

Description: Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 40960 tokens
Parameter Count: 23,572,403,200
Quantization Precision: 4-bit
File Size: 13.35 GB
Format: GGUF
License: apache-2.0
SHA256: 7680ba6895d405340f1461cb835a147055689a37d88b193cc5a365aaea76da9e

Replaced by: magistral-small1.2

Download: Magistral-Small-2506-Q4_K_M.gguf

Mistral Magistral Small 1.2 24B (magistral-small1.2)

Description: Magistral Small 1.2 (2509) builds upon Mistral Small 3.2 (2506) with added reasoning via SFT from Magistral Medium traces and RL, special [THINK]/[/THINK] tokens, and a 128K context. This GGUF release is text-only (no vision encoder) and should be paired with mistral-common for the correct chat template.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 40960 tokens
Parameter Count: 23,572,403,200
Quantization Precision: 4-bit
File Size: 13.35 GB
Format: GGUF
License: apache-2.0
SHA256: d3a024d29e0e8f35d9353f5d4f08fc3715406835c8ae1328ce2f3bd212a43434

Download: Magistral-Small-2509-Q4_K_M.gguf

OpenBMB MiniCPM o 2.6 Vision 8.1B (minicpm-o)

Description: An end-to-end multimodal model supporting real-time speech, image, and text understanding. Offers enhanced performance for conversational tasks.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: qwen2
Context Length: 32768 tokens
Parameter Count: 8,116,736,752
Quantization Precision: 4-bit
File Size: 5.00 GB
Format: LMK
License: OpenBMB
SHA256: 6fd17ed1f46bfcddb5a3482dd882dd022a46aa8c33cb93d75f809cd4d118ab53

Download: MiniCPM-o-V-2.6-Q4_K_M.lmk

OpenBMB MiniCPM 2.6 Vision 8.1B (minicpm-v)

Description: A multimodal model designed for vision and text tasks, built upon SigLip and Qwen architectures. Evaluate performance against current benchmarks.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: qwen2
Context Length: 32768 tokens
Parameter Count: 8,116,736,752
Quantization Precision: 4-bit
File Size: 5.00 GB
Format: LMK
License: OpenBMB
SHA256: a10b1aa434899ea0bd5bb5e281f622fed0b02434241d53435fce05773fa7cfa8

Replaced by: minicpm-o

Download: MiniCPM-V-2.6-Q4_K_M.lmk

OpenBMB MiniCPM-V 4.5 8B (minicpm-v-45)

Description: MiniCPM-V 4.5 is a state-of-the-art multimodal LLM built on Qwen3-8B and SigLIP2-400M. It delivers GPT-4o-level performance for single-image, multi-image, and high-FPS video understanding on local devices. The model supports controllable fast/deep thinking, real-time speech and text comprehension, strong OCR and document parsing (up to 1.8M pixels), and multilingual capabilities in 30+ languages. Optimized for efficiency, it enables CPU inference, mobile deployment, and scalable usage through formats like LMK, GGUF, and AWQ.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: qwen3
Context Length: 40960 tokens
Parameter Count: 8,715,965,680
Quantization Precision: 4-bit
File Size: 5.70 GB
Format: LMK
License: OpenBMB
SHA256: 000c56809f033e53637f364461cfadb8c4aa09e533a3fde66de39cbb41bf5cb7

Download: minicpm-v-4.5-8b.lmk

Mistral Ministral 3 3B (ministral3:3b)

Description: Smallest member of the Ministral 3 family, an edge-optimized multilingual instruct model with a 256K context window and solid reasoning and code capabilities for constrained hardware.

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: mistral3
Context Length: 262144 tokens
Parameter Count: 3,849,093,120
Quantization Precision: 4-bit
File Size: 2.42 GB
Format: LMK
License: apache-2.0
SHA256: 300ce4373c14a3c1d68e37a7f4537b98776eb8ccac50ff32b30cc8ae6191ee96

Download: ministral-3-3b-instruct-Q4_K_M.lmk

Mistral Ministral 3 8B (ministral3:8b)

Description: Mid-sized Ministral 3 variant that balances quality and cost, offering strong multilingual reasoning, math, and code performance with a 256K context while remaining practical for single-GPU and edge deployments.

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: mistral3
Context Length: 262144 tokens
Parameter Count: 8,918,030,336
Quantization Precision: 4-bit
File Size: 5.27 GB
Format: LMK
License: apache-2.0
SHA256: 4bd03de58774150e6d19e9a1f0e4f1e010784ac7b98801c35144f5fd796c81c8

Download: ministral-3-8b-instruct-Q4_K_M.lmk

Mistral Ministral 3 14B (ministral3:14b)

Description: Flagship Ministral 3 instruct model delivering frontier-level multilingual reasoning, math, and code performance in a 256K-token context, with design tuned for efficient edge and single-GPU deployment.

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: mistral3
Context Length: 262144 tokens
Parameter Count: 13,945,036,800
Quantization Precision: 4-bit
File Size: 8.11 GB
Format: LMK
License: apache-2.0
SHA256: 032759d8b1814347ae571a2dd6d24c2d36760141f07756a6559bb77a17a9e821

Download: ministral-3-14b-instruct-Q4_K_M.lmk

Mistral Nemo Instruct 2407 12.2B (mistral-nemo)

Description: An instruct-tuned variant developed in collaboration with NVIDIA, balancing model size with performance for conversational tasks.

Specifications:

Capabilities: Text Generation Chat
Architecture: llama
Context Length: 1024000 tokens
Parameter Count: 12,247,782,400
Quantization Precision: 4-bit
File Size: 6.96 GB
Format: GGUF
License: apache-2.0
SHA256: 579ab8f5178f5900d0c4e14534929aa0dba97e3f97be76b31ebe537ffd6cf169

Replaced by: ministral3:8b

Download: Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf

Mistral Small Instruct 2501 24B (mistral-small)

Description: Optimized for local deployment, this model balances parameter count and performance for chat and code tasks.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 32768 tokens
Parameter Count: 23,572,403,200
Quantization Precision: 4-bit
File Size: 13.35 GB
Format: GGUF
License: apache-2.0
SHA256: 4395b5c6136e29e9b11bdba2ee189302ad45dd5c3ef45073b729f077b8f0cec8

Replaced by: mistral-small3.2

Download: Mistral-Small-Instruct-2501-24B-Q4_K_M.gguf

Mistral Small 3.1 Instruct 2503 24B (mistral-small3.1)

Description: Mistral Small 3.1 (24B) enhances Mistral Small 3 with advanced vision, 128k context, multilingual support, agentic features, and efficient local deployment.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 131072 tokens
Parameter Count: 23,572,403,200
Quantization Precision: 4-bit
File Size: 13.35 GB
Format: GGUF
License: apache-2.0
SHA256: 68922ff3a311c81bc4e983f86e665a12213ee84710c210522f10e65ce980bda7

Replaced by: mistral-small3.2

Download: Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf

Mistral Small 3.2 Instruct 2503 24B (mistral-small3.2)

Description: Mistral Small 3.2 (24B) enhances Mistral Small 3 with advanced vision, 128k context, multilingual support, agentic features, and efficient local deployment.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: llama
Context Length: 131072 tokens
Parameter Count: 23,572,403,200
Quantization Precision: 4-bit
File Size: 13.35 GB
Format: GGUF
License: apache-2.0
SHA256: e1e9a516a90387ec98bb9c45c37dbb1478008d1fa46b216cca893cf008d92c29

Download: Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf

Nomic embed text v1.5 (nomic-embed-text)

Description: Provides flexible production embeddings using Matryoshka Representation Learning.

Specifications:

Capabilities: Text Embeddings
Architecture: nomic-bert
Context Length: 2048 tokens
Parameter Count: 136,731,648
Quantization Precision: 4-bit
File Size: 85.86 MB
Format: GGUF
License: apache-2.0
SHA256: 1a60949a331b30bb754ad60b7bdff80d8e563a56b3f7f3f1aed68db8c143003e

Download: nomic-embed-text-1.5-Q4_K_M.gguf

Nomic embed vision v1.5 (nomic-embed-vision)

Description: ViT-B/16-based image embedding model trained on 1.5B image-text pairs using Matryoshka Representation Learning. Outputs 768-dim embeddings aligned with Nomic Embed Text v1.5 for multimodal search, retrieval, and zero-shot classification.

Specifications:

Capabilities: Image Embeddings
Architecture: ViT-B/16
Context Length: 197 tokens
Parameter Count: 92,384,769
Quantization Precision: 8-bit
File Size: 92.26 MB
Format: ONNX
License: apache-2.0
SHA256: 4f6f6a765625a4b74ec3e62141b7b83e1db1fb904afeda1fa00c1fefefbcc714

Download: nomic-embed-vision-1.5-Q8.lmk

Microsoft Phi 3.5 Mini Instruct 3.8B (phi3.5)

Description: A lightweight model optimized for reasoning-dense tasks and extended context support. Designed for efficient instruction following.

Specifications:

Capabilities: Text Generation Chat
Architecture: phi3
Context Length: 131072 tokens
Parameter Count: 3,821,079,648
Quantization Precision: 4-bit
File Size: 2.23 GB
Format: GGUF
License: mit
SHA256: 782c34ae79564d1d92bd44dec233182559b3ecf6fedee44417e2a28c89bd0721

Replaced by: phi4-mini

Download: Phi-3.5-mini-Instruct-Q4_K_M.gguf

Microsoft Phi 4 Instruct 14.7B (phi4)

Description: An enhanced generative model trained on a diverse dataset to improve instruction adherence and reasoning capabilities.

Specifications:

Capabilities: Text Generation Chat Math
Architecture: phi3
Context Length: 16384 tokens
Parameter Count: 14,659,507,200
Quantization Precision: 4-bit
File Size: 8.43 GB
Format: GGUF
License: mit
SHA256: 03af8f5c5a87d526047f5c20c99e32bbafd5db6dbfdee8d498d0fe1a3c45af55

Download: Phi-4-14.7B-Instruct-Q4_K_M.gguf

Microsoft Phi 4 Mini Instruct 3.8B (phi4-mini)

Description: A lightweight open model from the Phi-4 family that uses synthetic and curated public data for reasoning-dense outputs, supports a 128K token context, and is enhanced through fine-tuning and preference optimization for precise instruction adherence and robust safety.

Specifications:

Capabilities: Text Generation Chat
Architecture: phi3
Context Length: 131072 tokens
Parameter Count: 3,836,021,856
Quantization Precision: 4-bit
File Size: 2.32 GB
Format: GGUF
License: mit
SHA256: 556492e72efc8d33406b236830ad38d25669482ea7ad91fc643de237e942b9f9

Download: Phi-4-mini-Instruct-Q4_K_M.gguf

Mistral Pixtral 12B (pixtral)

Description: Pixtral 12B is a natively multimodal model combining a 12 B parameter decoder with a 400 M vision encoder, trained on interleaved image–text data for variable image sizes, offering state-of-the-art performance in its weight class across multimodal and text-only benchmarks and supporting ultra-long 128 k sequence lengths.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: llama
Context Length: 1024000 tokens
Parameter Count: 12,682,744,832
Quantization Precision: 4-bit
File Size: 7.39 GB
Format: LMK
License: apache-2.0
SHA256: 28d42e60b5f33765ac6f3882abc4c7fd9f5a7955910ff117c13dbfc5aa6bf159

Download: pixtral-12B-Q4_K_M.lmk

Alibaba Qwen 2 Vision Instruct 2.2B (qwen2-vl:2b)

Description: A multilingual vision-language model featuring dynamic resolution processing for advanced image and long-video understanding.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: qwen2vl
Context Length: 32768 tokens
Parameter Count: 2,208,985,700
Quantization Precision: 4-bit
File Size: 1.27 GB
Format: LMK
License: apache-2.0
SHA256: b4e546acfd2271f5a0960b64445cae1091e5fc4192d74db72ae57c28729bd0b8

Replaced by: qwen3-vl:2b

Download: Qwen2-VL-2B-Instruct-Q4_K_M.lmk

Alibaba Qwen 2 Vision Instruct 8.3B (qwen2-vl:8b)

Description: An extended variant in the Qwen 2 Vision family for multilingual vision-language tasks, including advanced video analysis.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: qwen2vl
Context Length: 32768 tokens
Parameter Count: 8,291,375,716
Quantization Precision: 4-bit
File Size: 4.72 GB
Format: LMK
License: apache-2.0
SHA256: 90b3eb60611559ba7521590ecccdf1d2a4dfab007566221c6a42f19b91b48686

Replaced by: qwen3-vl:8b

Download: Qwen2-VL-8.3B-Instruct-Q4_K_M.lmk

Alibaba Qwen 2.5 Instruct 0.5B (qwen2.5:0.5b)

Description: A compact variant from the Alibaba Qwen 2.5 family, optimized for instruction following across chat, embeddings, and text generation tasks.

Specifications:

Capabilities: Text Generation Chat
Architecture: qwen2
Context Length: 32768 tokens
Parameter Count: 494,032,768
Quantization Precision: 4-bit
File Size: 379.38 MB
Format: GGUF
License: apache-2.0
SHA256: 09b44ff0ef0a160ffe50778c0828754201bb3a40522a941839c23acfbc9ceec0

Replaced by: qwen3:0.6b

Download: Qwen-2.5-0.5B-Instruct-Q4_K_M.gguf

Alibaba Qwen 2.5 Instruct 3.1B (qwen2.5:3b)

Description: A mid-sized model from the Alibaba Qwen 2.5 series, designed for diverse tasks including chat, embeddings, and text generation. Performance should be evaluated relative to current benchmarks.

Specifications:

Capabilities: Text Generation Chat
Architecture: qwen2
Context Length: 32768 tokens
Parameter Count: 3,085,938,688
Quantization Precision: 4-bit
File Size: 1.80 GB
Format: GGUF
License: qwen-research
SHA256: fb88cca2303e7f7d4d52679d633efe66d9c3e3555573b4444abe5ab8af4a97f7

Replaced by: qwen3:4b

Download: Qwen-2.5-3.1B-Instruct-Q4_K_M.gguf

Alibaba Qwen 2.5 Instruct 7.6B (qwen2.5:7b)

Description: A larger variant from the Alibaba Qwen 2.5 series that supports extended context and multiple tasks including chat, embeddings, and text generation.

Specifications:

Capabilities: Text Generation Chat
Architecture: qwen2
Context Length: 32768 tokens
Parameter Count: 7,615,616,512
Quantization Precision: 4-bit
File Size: 4.36 GB
Format: GGUF
License: apache-2.0
SHA256: 2bf11b8a7d566bddfcc2b222ed7b918afc51239c5f919532de8b9403981ad866

Replaced by: qwen3:8b

Download: Qwen-2.5-7B-Instruct-Q4_K_M.gguf

Alibaba Qwen 2.5 Vision Instruct 3B (qwen2.5-vl:3b)

Description: Qwen2.5 VL 3B Instruct is a compact vision-language chat model that delivers advanced object and text/chart understanding, agentic tool-driven interactions, long-video event localization, precise visual grounding with JSON outputs, and structured data extraction, powered by an optimized ViT encoder with dynamic temporal training.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: qwen2vl
Context Length: 128000 tokens
Parameter Count: 3,754,622,976
Quantization Precision: 4-bit
File Size: 2.58 GB
Format: LMK
License: qwen research license
SHA256: 78fee4fde9f7fd93e1365cae46668184a259b1bd2a3169915a4a1e7495f859f8

Replaced by: qwen3-vl:4b

Download: Qwen2.5-VL-3B-Instruct-Q4_K_M.lmk

Alibaba Qwen 2.5 Vision Instruct 7B (qwen2.5-vl:7b)

Description: Qwen2.5 VL 7B Instruct is a next-generation vision-language chat model that combines advanced object and text/chart understanding, agentic tool use, long-video event localization, precise visual grounding with JSON outputs, and structured data extraction, all powered by a streamlined ViT encoder with dynamic temporal training.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: qwen2vl
Context Length: 128000 tokens
Parameter Count: 8,292,166,656
Quantization Precision: 4-bit
File Size: 5.16 GB
Format: LMK
License: apache-2.0
SHA256: e9a99c7bb06c23bd60594cebf8a881af13f502742df3047eaa3b466c747f7453

Replaced by: qwen3-vl:8b

Download: Qwen2.5-VL-7B-Instruct-Q4_K_M.lmk

Alibaba Qwen 2.5 Vision Instruct 32B (qwen2.5-vl:32b)

Description: Qwen2.5 VL 32B Instruct is a next-generation vision-language chat model that combines advanced object and text/chart understanding, agentic tool use, long-video event localization, precise visual grounding with JSON outputs, and structured data extraction, all powered by a streamlined ViT encoder with dynamic temporal training.

Specifications:

Capabilities: Text Generation Chat Vision
Architecture: qwen2vl
Context Length: 128000 tokens
Parameter Count: 33,452,718,336
Quantization Precision: 4-bit
File Size: 5.16 GB
Format: LMK
License: apache-2.0
SHA256: 9081e05fc2832177162b9ea8ccde1e0fdb1d8ed429a838527af36de966e2fb92

Download: Qwen2.5-VL-32B-Instruct-Q4_K_M.lmk

Alibaba Qwen 3 Instruct 0.6B (qwen3:0.6b)

Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.

Specifications:

Capabilities: Text Generation Chat
Architecture: qwen3
Context Length: 40960 tokens
Parameter Count: 751,632,384
Quantization Precision: 4-bit
File Size: 461.79 MB
Format: GGUF
License: apache-2.0
SHA256: 2b1a7ed56061ad1275847412f61e8e009ada37ef865dccc25747dcc76eea9811

Download: Qwen3-0.6B-Q4_K_M.gguf

Alibaba Qwen 3 Instruct 1.7B (qwen3:1.7b)

Specifications:

Capabilities: Text Generation Chat
Architecture: qwen3
Context Length: 40960 tokens
Parameter Count: 2,031,739,904
Quantization Precision: 4-bit
File Size: 1.19 GB
Format: GGUF
License: apache-2.0
SHA256: b047d6617eba56dcfa3357566b06807f54b15816faf6182aabd12d7e2378e537

Download: Qwen3-1.7B-Q4_K_M.gguf

Alibaba Qwen 3 Instruct 4B (qwen3:4b)

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: qwen3
Context Length: 40960 tokens
Parameter Count: 4,022,468,096
Quantization Precision: 4-bit
File Size: 2.33 GB
Format: GGUF
License: apache-2.0
SHA256: 9dbc1e801f001ea316a627bb867fdd192fc3b36046fd69e160155ddc12129dbe

Download: Qwen3-4B-Q4_K_M.gguf

Alibaba Qwen 3 Instruct 8B (qwen3:8b)

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: qwen3
Context Length: 40960 tokens
Parameter Count: 8,190,735,360
Quantization Precision: 4-bit
File Size: 4.68 GB
Format: GGUF
License: apache-2.0
SHA256: b9059e3978453f50a8e9e45a825243abdb8739b2f4623e541fd5a392d9672c0f

Download: Qwen3-8B-Q4_K_M.gguf

Alibaba Qwen 3 Instruct 14B (qwen3:14b)

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: qwen3
Context Length: 40960 tokens
Parameter Count: 14,768,307,200
Quantization Precision: 4-bit
File Size: 8.38 GB
Format: GGUF
License: apache-2.0
SHA256: 520369028ee99e4a3ca413a35126337038a8da561927f81322c1b34aed10e03d

Download: Qwen3-14B-Q4_K_M.gguf

Alibaba Qwen 3 Embedding 0.6B (qwen3-embedding:0.6b)

Description: Lightweight member of the Qwen3 Embedding series, optimized for fast, low-resource semantic search and ranking while preserving strong multilingual and long-context understanding.

Specifications:

Capabilities: Text Embeddings
Architecture: qwen3
Context Length: 32768 tokens
Parameter Count: 595,776,512
Quantization Precision: 4-bit
File Size: 609.54 MB
Format: GGUF
License: apache-2.0
SHA256: b624c62027986bc4181eadcad0cee479916c498d1039f7063195fd4c14803023

Download: Qwen3-Embedding-0.6B-Q4_K_M.gguf

Alibaba Qwen 3 Embedding 4B (qwen3-embedding:4b)

Description: Mid-size Qwen3 Embedding model offering a strong accuracy–efficiency trade-off for multilingual retrieval, reranking, classification, clustering, and bitext mining with instruction-aware embeddings.

Specifications:

Capabilities: Text Embeddings
Architecture: qwen3
Context Length: 40960 tokens
Parameter Count: 4,021,774,336
Quantization Precision: 4-bit
File Size: 2.33 GB
Format: GGUF
License: apache-2.0
SHA256: ac48f080498db5874835a0c6db52aa8a726f8f88fca1dbbb26fc51d5311acb85

Download: Qwen3-Embedding-4B-Q4_K_M.gguf

Alibaba Qwen 3 Embedding 8B (qwen3-embedding:8b)

Description: Flagship Qwen3 Embedding model delivering state-of-the-art multilingual and cross-lingual embeddings for dense retrieval, reranking, text/code search, clustering, and classification, with flexible output dimensions.

Specifications:

Capabilities: Text Embeddings
Architecture: qwen3
Context Length: 40960 tokens
Parameter Count: 7,567,295,488
Quantization Precision: 4-bit
File Size: 4.36 GB
Format: GGUF
License: apache-2.0
SHA256: 3822cc88f2f3e9a08c4b9bae87261a7e94c503fe0372ad9b6c5b80161886291a

Download: Qwen3-Embedding-8B-Q4_K_M.gguf

Alibaba Qwen 3 Vision Instruct 2B (qwen3-vl:2b)

Description: Qwen3-VL is the latest Qwen vision-language family with stronger text understanding/generation, deeper visual reasoning, native 256K context (expandable), upgraded OCR (32 langs), long-video comprehension, and agentic GUI/tool use. The 2B Instruct edition targets edge devices for multimodal chat, grounding, and document/image understanding.

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: qwen3vl
Context Length: 262144 tokens
Parameter Count: 2,127,532,032
Quantization Precision: 4-bit
File Size: 1.29 GB
Format: LMK
License: apache-2.0
SHA256: 296c414827f80a0205371f2d407169f71457c4b9f9ed49edd1b28e8b9f697ace

Download: qwen3-vl-2b-instruct-Q4_K_M.lmk

Alibaba Qwen 3 Vision Instruct 4B (qwen3-vl:4b)

Description: Qwen3-VL 4B Instruct balances efficiency and quality with advanced spatial perception (2D/3D grounding), timestamp-aligned video reasoning, and “visual coding” (HTML/CSS/JS from images). Suited for on-device or small-GPU multimodal assistants, retrieval, and structured understanding.

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: qwen3vl
Context Length: 262144 tokens
Parameter Count: 2,127,532,032
Quantization Precision: 4-bit
File Size: 2.59 GB
Format: LMK
License: apache-2.0
SHA256: 30c7ff027b6b148950533fbc6cf473abc288245af61cd23ca3ae135b6db9f3e8

Download: qwen3-vl-4b-instruct-Q4_K_M.lmk

Alibaba Qwen 3 Vision Instruct 8B (qwen3-vl:8b)

Description: Qwen3-VL 8B Instruct is the highest-quality dense variant, delivering state-of-the-art multimodal reasoning, long-horizon video understanding, stronger recognition (celebrities, products, flora/fauna, etc.), and robust agent/tool interaction—ideal for high-fidelity VLM chat and STEM tasks.

Specifications:

Capabilities: Text Generation Chat Code Completion Math Vision
Architecture: qwen3vl
Context Length: 262144 tokens
Parameter Count: 8,767,123,696
Quantization Precision: 4-bit
File Size: 5.21 GB
Format: LMK
License: apache-2.0
SHA256: d5b6dc6963f027ce826df4c985e806f5a54a020191bbcc145fa21195f382be64

Download: qwen3-vl-8b-instruct-Q4_K_M.lmk

Alibaba Qwen QwQ 32.5B (qwq)

Description: QwQ is a reasoning-focused model in the Qwen series that significantly outperforms conventional instruction-tuned models on challenging tasks, with QwQ-32B demonstrating competitive performance compared to top reasoning models like DeepSeek-R1 and o1-mini.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: qwen2
Context Length: 40960 tokens
Parameter Count: 32,763,876,352
Quantization Precision: 4-bit
File Size: 18.49 GB
Format: GGUF
License: apache-2.0
SHA256: 6c2c72d16bbf5b0c30ac22031e0800b982b7d5c4e4d27daa62b66ee61c565d17

Download: QwQ-32B-Q4_K_M.gguf

HuggingFace SmolLM3 3B (smollm3:3b)

Description: SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages, advanced reasoning and long context. SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale.

Specifications:

Capabilities: Text Generation Chat Code Completion Math
Architecture: smollm3
Context Length: 65536 tokens
Parameter Count: 3,075,098,624
Quantization Precision: 4-bit
File Size: 1.78 GB
Format: GGUF
License: apache-2.0
SHA256: 03cc959aceff388ca737e4714a20cf4b3ef403116c9759a2a99f504dec40294e

Download: SmolLM3-3B-Q4_K_M.gguf

U2-Net 44M (u2net)

Description: A U-square nested U-Net for salient object detection and general image segmentation; lightweight encoder–decoder with RSU blocks.

Specifications:

Capabilities: Image Segmentation
Architecture: u2net
Context Length: 0 tokens
Parameter Count: 44,000,000
Quantization Precision: 32-bit
File Size: 167.85 MB
Format: LMK
License: apache-2.0
SHA256: bfc5e34225e3c8d3b5c3ffd3b128c7d7e6bb17de9bde56b3a6d0654de5e73661

Download: u2-net-F32.lmk

OpenAI Whisper Base (whisper-base)

Description: A balanced Whisper model delivering moderate resource use with reliable transcription accuracy.

Specifications:

Capabilities: Speech-to-Text
Architecture: whisper
Context Length: 1500 tokens
Parameter Count: 72,593,920
Quantization Precision: 8-bit
File Size: 77.98 MB
Format: GGML
License: mit
SHA256: c577b9a86e7e048a0b7eada054f4dd79a56bbfa911fbdacf900ac5b567cbb7d9

Download: whisper-base-q8_0.bin

OpenAI Whisper Large Turbo V3 (whisper-large-turbo3)

Description: A turbo-optimized Whisper large v3 variant for faster transcription with near-v3 accuracy.

Specifications:

Capabilities: Speech-to-Text
Architecture: whisper
Context Length: 1500 tokens
Parameter Count: 808,878,080
Quantization Precision: 8-bit
File Size: 833.69 MB
Format: GGML
License: mit
SHA256: 317eb69c11673c9de1e1f0d459b253999804ec71ac4c23c17ecf5fbe24e259a1

Download: whisper-large-v3-turbo-q8_0.bin

OpenAI Whisper Large V3 (whisper-large3)

Description: The largest Whisper v3 model providing state-of-the-art transcription accuracy across varied audio.

Specifications:

Capabilities: Speech-to-Text
Architecture: whisper
Context Length: 1500 tokens
Parameter Count: 1,543,490,560
Quantization Precision: 8-bit
File Size: 1.54 GB
Format: GGML
License: mit
SHA256: 37efc6b68f300ab717465685f7c3e175a66c11cf92bb3ab9912e86f4116c465e

Download: whisper-large-v3-q8_0.bin

OpenAI Whisper Medium (whisper-medium)

Description: A medium-sized Whisper model offering high-quality transcription for diverse audio scenarios.

Specifications:

Capabilities: Speech-to-Text
Architecture: whisper
Context Length: 1500 tokens
Parameter Count: 763,857,920
Quantization Precision: 8-bit
File Size: 785.23 MB
Format: GGML
License: mit
SHA256: 42a1ffcbe4167d224232443396968db4d02d4e8e87e213d3ee2e03095dea6502

Download: whisper-medium-q8_0.bin

OpenAI Whisper Small (whisper-small)

Description: A small Whisper model providing improved transcription fidelity while remaining efficient.

Specifications:

Capabilities: Speech-to-Text
Architecture: whisper
Context Length: 1500 tokens
Parameter Count: 241,734,912
Quantization Precision: 8-bit
File Size: 252.21 MB
Format: GGML
License: mit
SHA256: 49c8fb02b65e6049d5fa6c04f81f53b867b5ec9540406812c643f177317f779f

Download: whisper-small-q8_0.bin

OpenAI Whisper Tiny (whisper-tiny)

Description: The smallest Whisper variant offering fast, lightweight speech-to-text transcription.

Specifications:

Capabilities: Speech-to-Text
Architecture: whisper
Context Length: 1500 tokens
Parameter Count: 37,760,640
Quantization Precision: 8-bit
File Size: 41.52 MB
Format: GGML
License: mit
SHA256: c2085835d3f50733e2ff6e4b41ae8a2b8d8110461e18821b09a15c40c42d1cca

Download: whisper-tiny-q8_0.bin

Table of Contents

LM-Kit Model Catalog

Model Details