A model can be loaded by its Model ID:
var model = LM.LoadFromModelID("gemma3:4b");
LM-Kit Model Catalog
Model & ID | Capabilities | Context | Params. | Format | License | Download | Details |
---|---|---|---|---|---|---|---|
BAAI bge m3bge-m3 |
8192 | 0.57 B | GGUF | mit | bge-m3-Q4_K_M.gguf | details | |
BAAI bge m3 reranker v2bge-m3-reranker |
Text Reranking | 8192 | 0.57 B | GGUF | apache-2.0 | Bge-M3-568M-Q4_K_M.gguf | details |
BAAI bge small en v1.5bge-small |
512 | 0.03 B | GGUF | mit | bge-small-en-v1.5-f16.gguf | details | |
DeepSeek Coder V2 Litedeepseek-coder-v2:16b |
Code Completion | 163840 | 15.71 B | GGUF | deepseek | DeepSeek-Coder-2-Lite-15.7B-Instruct-Q4_K_M.gguf | details |
DeepSeek R1 Distill Llamadeepseek-r1:8b |
Text Generation Chat Code Completion Math | 131072 | 8.03 B | GGUF | mit | DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf | details |
TII Falcon 3 Instructfalcon3:3b |
Text Generation Chat Code Completion Math | 32768 | 3.23 B | GGUF | falcon-llm-license | Falcon3-3B-Instruct-q4_k_m.gguf | details |
TII Falcon 3 Instructfalcon3:7b |
Text Generation Chat Code Completion Math | 32768 | 7.62 B | GGUF | falcon-llm-license | Falcon-3-7.6B-Instruct-Q4_K_M.gguf | details |
TII Falcon 3 Instructfalcon3:10b |
Text Generation Chat Code Completion Math | 32768 | 10.31 B | GGUF | falcon-llm-license | Falcon3-10B-Instruct-q4_k_m.gguf | details |
Google Gemma 2gemma2:2b |
Text Generation Chat | 8192 | 2.61 B | GGUF | gemma | gemma-2-2B-Q4_K_M.gguf | Replaced bygemma3:1b |
Google Gemma 2gemma2:9b |
Text Generation Chat | 8192 | 9.24 B | GGUF | gemma | gemma-2-9B-Q4_K_M.gguf | Replaced bygemma3:4b |
Google Gemma 2gemma2:27b |
Text Generation Chat | 8192 | 27.23 B | GGUF | gemma | gemma-2-27B-Q4_K_M.gguf | Replaced bygemma3:27b |
Google Gemma 3gemma3:1b |
Text Generation Chat | 32768 | 1.00 B | GGUF | gemma | gemma-3-it-1B-Q4_K_M.gguf | details |
Google Gemma 3gemma3:4b |
Text Generation Chat Code Completion Math Vision | 131072 | 3.88 B | GGUF | gemma | gemma-3-4b-it-Q4_K_M.lmk | details |
Google Gemma 3gemma3:12b |
Text Generation Chat Code Completion Math Vision | 131072 | 11.77 B | GGUF | gemma | gemma-3-12b-it-Q4_K_M.lmk | details |
Google Gemma 3gemma3:27b |
Text Generation Chat Code Completion Math Vision | 131072 | 27.01 B | GGUF | gemma | gemma-3-27b-it-Q4_K_M.lmk | details |
IBM Granite 3.1 Dense Instructgranite3.1-dense:2b |
Text Generation Chat Code Completion | 131072 | 2.53 B | GGUF | apache-2.0 | granite-3.1-2.5B-Q4_K_M.gguf | Replaced bygranite3.3:2b |
IBM Granite 3.1 Dense Instructgranite3.1-dense:8b |
Text Generation Chat Code Completion | 131072 | 8.17 B | GGUF | apache-2.0 | granite-3.1-8.2B-Q4_K_M.gguf | Replaced bygranite3.3:8b |
IBM Granite 3.3 Instructgranite3.3:2b |
Text Generation Chat Code Completion | 131072 | 2.53 B | GGUF | apache-2.0 | granite-3.3-8B-Instruct-Q4_K_M.gguf | details |
IBM Granite 3.3 Instructgranite3.3:8b |
Text Generation Chat Code Completion | 131072 | 8.17 B | GGUF | apache-2.0 | granite-3.3-2B-Instruct-Q4_K_M.gguf | details |
Meta Llama 3.1 Instructllama3.1 |
Text Generation Chat | 131072 | 8.03 B | GGUF | llama3.1 | Llama-3.1-8B-Instruct-Q4_K_M.gguf | details |
Meta Llama 3.2 Instructllama3.2:1b |
Text Generation Chat | 131072 | 1.24 B | GGUF | llama3.2 | Llama-3.2-1B-Instruct-Q4_K_M.gguf | details |
Meta Llama 3.2 Instructllama3.2:3b |
Text Generation Chat | 131072 | 3.21 B | GGUF | llama3.2 | Llama-3.2-3B-Instruct-Q4_K_M.gguf | details |
Meta Llama 3.3 Instructllama3.3 |
Text Generation Chat Code Completion Math | 131072 | 70.55 B | GGUF | llama3.3 | Llama-3.3-70B-Instruct-Q4_K_M.gguf | details |
LM-Kit Sarcasm Detection V1lmkit-sarcasm-detection |
Sentiment Analysis | 2048 | 1.10 B | GGUF | lm-kit | LM-Kit.Sarcasm_Detection-TinyLlama-1.1B-1T-OpenOrca-en-q4.gguf | details |
LM-Kit Sentiment Analysis V2lmkit-sentiment-analysis |
Sentiment Analysis | 131072 | 1.24 B | GGUF | lm-kit | lm-kit-sentiment-analysis-2.0-1b-q4.gguf | details |
OpenBMB MiniCPM o 2.6 Visionminicpm-o |
Text Generation Chat Vision | 32768 | 8.12 B | LMK | OpenBMB | MiniCPM-o-V-2.6-Q4_K_M.lmk | details |
OpenBMB MiniCPM 2.6 Visionminicpm-v |
Text Generation Chat Vision | 32768 | 8.12 B | LMK | OpenBMB | MiniCPM-V-2.6-Q4_K_M.lmk | Replaced byminicpm-o |
Mistral Nemo Instruct 2407mistral-nemo |
Text Generation Chat | 1024000 | 12.25 B | GGUF | apache-2.0 | Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf | details |
Mistral Small Instruct 2501mistral-small |
Text Generation Chat Code Completion Math | 32768 | 23.57 B | GGUF | apache-2.0 | Mistral-Small-Instruct-2501-24B-Q4_K_M.gguf | Replaced bymistral-small3.1 |
Mistral Small 3.1 Instruct 2503mistral-small3.1 |
Text Generation Chat Code Completion Math | 131072 | 23.57 B | GGUF | apache-2.0 | Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf | details |
Nomic embed text v1.5nomic-embed-text |
2048 | 0.14 B | GGUF | apache-2.0 | nomic-embed-text-1.5-Q4_K_M.gguf | details | |
Nomic embed vision v1.5nomic-embed-vision |
197 | 0.09 B | ONNX | apache-2.0 | nomic-embed-vision-1.5-Q8.lmk | details | |
Microsoft Phi 3.5 Mini Instructphi3.5 |
Text Generation Chat | 131072 | 3.82 B | GGUF | mit | Phi-3.5-mini-Instruct-Q4_K_M.gguf | Replaced byphi4-mini |
Microsoft Phi 4 Instructphi4 |
Text Generation Chat Math | 16384 | 14.66 B | GGUF | mit | Phi-4-14.7B-Instruct-Q4_K_M.gguf | details |
Microsoft Phi 4 Mini Instructphi4-mini |
Text Generation Chat | 131072 | 3.84 B | GGUF | mit | Phi-4-mini-Instruct-Q4_K_M.gguf | details |
Mistral Pixtralpixtral |
Text Generation Chat Vision | 1024000 | 12.68 B | LMK | apache-2.0 | pixtral-12B-Q4_K_M.lmk | details |
Alibaba Qwen 2 Vision Instructqwen2-vl:2b |
Text Generation Chat Vision | 32768 | 2.21 B | LMK | apache-2.0 | Qwen2-VL-2B-Instruct-Q4_K_M.lmk | Replaced byqwen2.5-vl:3b |
Alibaba Qwen 2 Vision Instructqwen2-vl:8b |
Text Generation Chat Vision | 32768 | 8.29 B | LMK | apache-2.0 | Qwen2-VL-8.3B-Instruct-Q4_K_M.lmk | Replaced byqwen2.5-vl:7b |
Alibaba Qwen 2.5 Instructqwen2.5:0.5b |
Text Generation Chat | 32768 | 0.49 B | GGUF | apache-2.0 | Qwen-2.5-0.5B-Instruct-Q4_K_M.gguf | Replaced byqwen3:0.6b |
Alibaba Qwen 2.5 Instructqwen2.5:3b |
Text Generation Chat | 32768 | 3.09 B | GGUF | qwen-research | Qwen-2.5-3.1B-Instruct-Q4_K_M.gguf | Replaced byqwen3:4b |
Alibaba Qwen 2.5 Instructqwen2.5:7b |
Text Generation Chat | 32768 | 7.62 B | GGUF | apache-2.0 | Qwen-2.5-7B-Instruct-Q4_K_M.gguf | Replaced byqwen3:8b |
Alibaba Qwen 2.5 Vision Instructqwen2.5-vl:3b |
Text Generation Chat Vision | 128000 | 3.75 B | LMK | apache-2.0 | Qwen2.5-VL-3B-Instruct-Q4_K_M.lmk | details |
Alibaba Qwen 2.5 Vision Instructqwen2.5-vl:7b |
Text Generation Chat Vision | 128000 | 8.29 B | LMK | apache-2.0 | Qwen2.5-VL-7B-Instruct-Q4_K_M.lmk | details |
Alibaba Qwen 3 Instructqwen3:0.6b |
Text Generation Chat | 40960 | 0.75 B | GGUF | apache-2.0 | Qwen3-0.6B-Q4_K_M.gguf | details |
Alibaba Qwen 3 Instructqwen3:1.7b |
Text Generation Chat | 40960 | 2.03 B | GGUF | apache-2.0 | Qwen3-1.7B-Q4_K_M.gguf | details |
Alibaba Qwen 3 Instructqwen3:4b |
Text Generation Chat Code Completion Math | 40960 | 4.02 B | GGUF | apache-2.0 | Qwen3-4B-Q4_K_M.gguf | details |
Alibaba Qwen 3 Instructqwen3:8b |
Text Generation Chat Code Completion Math | 40960 | 8.19 B | GGUF | apache-2.0 | Qwen3-8B-Q4_K_M.gguf | details |
Alibaba Qwen 3 Instructqwen3:14b |
Text Generation Chat Code Completion Math | 40960 | 14.77 B | GGUF | apache-2.0 | Qwen3-14B-Q4_K_M.gguf | details |
Alibaba Qwen QwQqwq |
Text Generation Chat Code Completion Math | 40960 | 32.76 B | GGUF | apache-2.0 | QwQ-32B-Q4_K_M.gguf | details |
Model Details
BAAI bge m3 (bge-m3
)
Description: A unified, multilingual embedding model that delivers dense, sparse, and multi-vector retrieval on texts from short queries up to 8,192-token documents in over 100 languages.
Specifications:
- Capabilities:
- Architecture: bert
- Context Length: 8192 tokens
- Parameter Count: 566,703,104
- Quantization Precision: 4-bit
- File Size: 417.50 MB
- Format: GGUF
- License: mit
- SHA256:
e251234fcb7d050991a6be491952f485bf5c641dd10c3272dc1301fd281ad50f
Download: bge-m3-Q4_K_M.gguf
BAAI bge m3 reranker v2 (bge-m3-reranker
)
Description: A unified, multilingual reranker that ingests query–document pairs and directly produces sigmoid-normalized relevance scores across over 100 languages.
Specifications:
- Capabilities: Text Reranking
- Architecture: bert
- Context Length: 8192 tokens
- Parameter Count: 567,753,729
- Quantization Precision: 4-bit
- File Size: 418.07 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
ce947cece730cbf7d836da8c5490a9987ef0f919014b9275e7ce9aa12d96e6d9
Download: Bge-M3-568M-Q4_K_M.gguf
BAAI bge small en v1.5 (bge-small
)
Description: An efficient, CPU-friendly English embedding model (BAAI General Embedding) designed for lightweight applications.
Specifications:
- Capabilities:
- Architecture: bert
- Context Length: 512 tokens
- Parameter Count: 33,212,160
- Quantization Precision: 16-bit
- File Size: 64.45 MB
- Format: GGUF
- License: mit
- SHA256:
cd5790da23df71e7e20fe20bb523bd4586a533a4ee813cc562e32b37929141c1
Download: bge-small-en-v1.5-f16.gguf
DeepSeek Coder V2 Lite 15.7B (deepseek-coder-v2:16b
)
Description: An open-source mixture-of-experts code model tailored for code completion tasks. Early evaluations indicated competitive performance relative to leading code models.
Specifications:
- Capabilities: Code Completion
- Architecture: deepseek2
- Context Length: 163840 tokens
- Parameter Count: 15,706,484,224
- Quantization Precision: 4-bit
- File Size: 9884.28 MB
- Format: GGUF
- License: deepseek
- SHA256:
ac398e8c1c670d3c362d3c1182614916bab7c364708ec073fcf947f6802d509e
DeepSeek R1 Distill Llama 8B (deepseek-r1:8b
)
Description: DeepSeek-R1 enhances its predecessor by integrating cold-start data to overcome repetition and readability issues, achieving state-of-the-art performance in math, code, and reasoning tasks, with all models open-sourced.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: llama
- Context Length: 131072 tokens
- Parameter Count: 8,030,261,312
- Quantization Precision: 4-bit
- File Size: 4692.78 MB
- Format: GGUF
- License: mit
- SHA256:
596fce705423e44831fe63367a30ccc7b36921c1bfdd4b9dfde85a5aa97ac2ef
Download: DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf
TII Falcon 3 Instruct 3.2B (falcon3:3b
)
Description: Designed for multilingual tasks including chat, text generation, and code completion, supporting extended context lengths.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: llama
- Context Length: 32768 tokens
- Parameter Count: 3,227,655,168
- Quantization Precision: 4-bit
- File Size: 1912.77 MB
- Format: GGUF
- License: falcon-llm-license
- SHA256:
81c6b52d221c2f0eea3db172fc74de28534f2fd15f198ecbfcc55577d20cbf8a
Download: Falcon3-3B-Instruct-q4_k_m.gguf
TII Falcon 3 Instruct 7.6B (falcon3:7b
)
Description: Offers robust performance across chat, text generation, and mathematical reasoning tasks with extended context support.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: llama
- Context Length: 32768 tokens
- Parameter Count: 7,615,616,512
- Quantization Precision: 4-bit
- File Size: 4358.03 MB
- Format: GGUF
- License: falcon-llm-license
- SHA256:
4ce1da546d76e04ce77eb076556eb25e1096faf6155ee429245e4bfa3f5ddf5d
Download: Falcon-3-7.6B-Instruct-Q4_K_M.gguf
TII Falcon 3 Instruct 10.3B (falcon3:10b
)
Description: A larger variant tailored for multilingual dialogue, code completion, and complex reasoning tasks with extended context support.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: llama
- Context Length: 32768 tokens
- Parameter Count: 10,305,653,760
- Quantization Precision: 4-bit
- File Size: 5996.25 MB
- Format: GGUF
- License: falcon-llm-license
- SHA256:
a0c0edbd35019ff26d972a0373b25b4c8d72315395a3b6036aca5e6bafa3d819
Download: Falcon3-10B-Instruct-q4_k_m.gguf
Google Gemma 2 2.6B (gemma2:2b
)
Description: A lightweight decoder-only model from Google, available in both pre-trained and instruction-tuned variants for text-to-text tasks.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: gemma2
- Context Length: 8192 tokens
- Parameter Count: 2,614,341,888
- Quantization Precision: 4-bit
- File Size: 1629.43 MB
- Format: GGUF
- License: gemma
- SHA256:
362d09c1496e035ecf0737d8fe03e8e607c61e57e16b22cedd158525f6721e06
Replaced by: gemma3:1b
Download: gemma-2-2B-Q4_K_M.gguf
Google Gemma 2 9.2B (gemma2:9b
)
Description: A decoder-only text-to-text model from Google, offering competitive performance in both pre-trained and instruction-tuned configurations.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: gemma2
- Context Length: 8192 tokens
- Parameter Count: 9,241,705,984
- Quantization Precision: 4-bit
- File Size: 5494.17 MB
- Format: GGUF
- License: gemma
- SHA256:
b6059a960d2f4f881630f1e795b40f7e09e5e12d3a6b1900474d6108ea880afd
Replaced by: gemma3:4b
Download: gemma-2-9B-Q4_K_M.gguf
Google Gemma 2 27.2B (gemma2:27b
)
Description: A larger variant in the Gemma 2 family, optimized for text generation and instruction following with open weights provided.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: gemma2
- Context Length: 8192 tokens
- Parameter Count: 27,227,128,320
- Quantization Precision: 4-bit
- File Size: 15874.27 MB
- Format: GGUF
- License: gemma
- SHA256:
bb4b276745da743d550720dc2e6c847498eef45e7b82a4d5a73ef6636f78027a
Replaced by: gemma3:27b
Download: gemma-2-27B-Q4_K_M.gguf
Google Gemma 3 1B (gemma3:1b
)
Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: gemma3
- Context Length: 32768 tokens
- Parameter Count: 999,885,952
- Quantization Precision: 4-bit
- File Size: 768.72 MB
- Format: GGUF
- License: gemma
- SHA256:
bacfe3de6eee9fba412d5c0415630172c2a602dae26bb353e1b20dd67194a226
Download: gemma-3-it-1B-Q4_K_M.gguf
Google Gemma 3 3.9B (gemma3:4b
)
Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math Vision
- Architecture: gemma3
- Context Length: 131072 tokens
- Parameter Count: 3,880,099,328
- Quantization Precision: 4-bit
- File Size: 2938.40 MB
- Format: GGUF
- License: gemma
- SHA256:
abb283e96c0abf58468a18127ce6e8b2bfc98e48f1ec618f658495c09254bdae
Download: gemma-3-4b-it-Q4_K_M.lmk
Google Gemma 3 11.8B (gemma3:12b
)
Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math Vision
- Architecture: gemma3
- Context Length: 131072 tokens
- Parameter Count: 11,765,788,416
- Quantization Precision: 4-bit
- File Size: 7529.17 MB
- Format: GGUF
- License: gemma
- SHA256:
d6f01cdb4369769ea87c5211a7fd865e12dbb9e2a937b43ef281a5b7e9ba2e35
Download: gemma-3-12b-it-Q4_K_M.lmk
Google Gemma 3 27.2B (gemma3:27b
)
Description: Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math Vision
- Architecture: gemma3
- Context Length: 131072 tokens
- Parameter Count: 27,009,002,240
- Quantization Precision: 4-bit
- File Size: 16350.05 MB
- Format: GGUF
- License: gemma
- SHA256:
2d0e4382259ae2da28b9c0342e982a58eafbddad7c05bbfe6e104f2b3c165994
Download: gemma-3-27b-it-Q4_K_M.lmk
IBM Granite 3.1 Dense Instruct 2.5B (granite3.1-dense:2b
)
Description: A long-context instruct model finetuned with a mix of open source and synthetic datasets. Designed for dialogue and text generation tasks.
Specifications:
- Capabilities: Text Generation Chat Code Completion
- Architecture: granite
- Context Length: 131072 tokens
- Parameter Count: 2,533,531,648
- Quantization Precision: 4-bit
- File Size: 1473.71 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
ba05b36d0a8cebf8ccd13bbbb904bebe182f4854fbcff19cd1ee54bc82bbd298
Replaced by: granite3.3:2b
Download: granite-3.1-2.5B-Q4_K_M.gguf
IBM Granite 3.1 Dense Instruct 8.2B (granite3.1-dense:8b
)
Description: An extended-context model optimized for dialogue and code completion tasks. Developed with diverse training data to enhance long-context understanding.
Specifications:
- Capabilities: Text Generation Chat Code Completion
- Architecture: granite
- Context Length: 131072 tokens
- Parameter Count: 8,170,848,256
- Quantization Precision: 4-bit
- File Size: 4713.88 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
d1ada98d7b274fc6b119bd19b8d3536cd006544e9aae06db6f8b2c256d584044
Replaced by: granite3.3:8b
Download: granite-3.1-8.2B-Q4_K_M.gguf
IBM Granite 3.3 Instruct 2.5B (granite3.3:2b
)
Description: A long-context instruct model finetuned with a mix of open source and synthetic datasets. Designed for dialogue and text generation tasks.
Specifications:
- Capabilities: Text Generation Chat Code Completion
- Architecture: granite
- Context Length: 131072 tokens
- Parameter Count: 2,533,539,840
- Quantization Precision: 4-bit
- File Size: 1473.72 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
dbe4dd51bd6c1e39f96c831bf086454c9b313bd1c279ebb7166f2a37d86598da
Download: granite-3.3-8B-Instruct-Q4_K_M.gguf
IBM Granite 3.3 Instruct 8.2B (granite3.3:8b
)
Description: An extended-context model optimized for dialogue and code completion tasks. Developed with diverse training data to enhance long-context understanding.
Specifications:
- Capabilities: Text Generation Chat Code Completion
- Architecture: granite
- Context Length: 131072 tokens
- Parameter Count: 8,170,864,640
- Quantization Precision: 4-bit
- File Size: 4713.89 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
1c890e740d7ecb010716a858eda315c01ac5bb0edfaf68bf17118868a26bb8ff
Download: granite-3.3-2B-Instruct-Q4_K_M.gguf
Meta Llama 3.1 Instruct 8B (llama3.1
)
Description: A multilingual generative model optimized for dialogue and text generation tasks. Designed for robust performance on common benchmarks.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: llama
- Context Length: 131072 tokens
- Parameter Count: 8,030,261,312
- Quantization Precision: 4-bit
- File Size: 4692.78 MB
- Format: GGUF
- License: llama3.1
- SHA256:
ad00fe50a62d1e009b4e06cd57ab55c9a30cbf5e7f183de09115d75ada73bd5b
Download: Llama-3.1-8B-Instruct-Q4_K_M.gguf
Meta Llama 3.2 Instruct 1.2B (llama3.2:1b
)
Description: A multilingual instruct-tuned model optimized for dialogue, retrieval, and summarization tasks.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: llama
- Context Length: 131072 tokens
- Parameter Count: 1,235,814,432
- Quantization Precision: 4-bit
- File Size: 770.28 MB
- Format: GGUF
- License: llama3.2
- SHA256:
88725e821cf35f1a0dbeaa4a3bebeb91e6c6b6a9d50f808ab42d64233284cce1
Download: Llama-3.2-1B-Instruct-Q4_K_M.gguf
Meta Llama 3.2 Instruct 3.2B (llama3.2:3b
)
Description: A multilingual dialogue model with robust text generation and summarization capabilities.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: llama
- Context Length: 131072 tokens
- Parameter Count: 3,212,749,888
- Quantization Precision: 4-bit
- File Size: 1925.83 MB
- Format: GGUF
- License: llama3.2
- SHA256:
6810bf3cce69d440a22b85a3b3e28f57c868f1c98686abd995f1dc5d9b955cfe
Download: Llama-3.2-3B-Instruct-Q4_K_M.gguf
Meta Llama 3.3 Instruct 70.6B (llama3.3
)
Description: A large multilingual generative model optimized for dialogue, text tasks, code completion, and mathematical reasoning with extended context support.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: llama
- Context Length: 131072 tokens
- Parameter Count: 70,553,706,560
- Quantization Precision: 4-bit
- File Size: 40550.61 MB
- Format: GGUF
- License: llama3.3
- SHA256:
57f78fe3b141afa56406278265656524c51c9837edb3537ad43708b6d4ecc04d
Download: Llama-3.3-70B-Instruct-Q4_K_M.gguf
LM-Kit Sarcasm Detection V1 1.1B (lmkit-sarcasm-detection
)
Description: Optimized for detecting sarcasm in English text within the LM-Kit framework. Suitable for CPU-based inference.
Specifications:
- Capabilities: Sentiment Analysis
- Architecture: llama
- Context Length: 2048 tokens
- Parameter Count: 1,100,048,384
- Quantization Precision: 4-bit
- File Size: 636.88 MB
- Format: GGUF
- License: lm-kit
- SHA256:
cc82abd224dba9c689b19d368db6078d6167ca84897b21870d7d6a2c0f09d7d0
Download: LM-Kit.Sarcasm_Detection-TinyLlama-1.1B-1T-OpenOrca-en-q4.gguf
LM-Kit Sentiment Analysis V2 1.2B (lmkit-sentiment-analysis
)
Description: Designed for multilingual sentiment analysis tasks, this LM-Kit model is optimized for efficient CPU-based inference.
Specifications:
- Capabilities: Sentiment Analysis
- Architecture: llama
- Context Length: 131072 tokens
- Parameter Count: 1,235,814,432
- Quantization Precision: 4-bit
- File Size: 770.28 MB
- Format: GGUF
- License: lm-kit
- SHA256:
e12f4abf6453a8431985ce1d6350c265cd58b25210156a917e3608c850fd7add
Download: lm-kit-sentiment-analysis-2.0-1b-q4.gguf
OpenBMB MiniCPM o 2.6 Vision 8.1B (minicpm-o
)
Description: An end-to-end multimodal model supporting real-time speech, image, and text understanding. Offers enhanced performance for conversational tasks.
Specifications:
- Capabilities: Text Generation Chat Vision
- Architecture: qwen2
- Context Length: 32768 tokens
- Parameter Count: 8,116,736,752
- Quantization Precision: 4-bit
- File Size: 5120.87 MB
- Format: LMK
- License: OpenBMB
- SHA256:
6fd17ed1f46bfcddb5a3482dd882dd022a46aa8c33cb93d75f809cd4d118ab53
Download: MiniCPM-o-V-2.6-Q4_K_M.lmk
OpenBMB MiniCPM 2.6 Vision 8.1B (minicpm-v
)
Description: A multimodal model designed for vision and text tasks, built upon SigLip and Qwen architectures. Evaluate performance against current benchmarks.
Specifications:
- Capabilities: Text Generation Chat Vision
- Architecture: qwen2
- Context Length: 32768 tokens
- Parameter Count: 8,116,736,752
- Quantization Precision: 4-bit
- File Size: 5120.70 MB
- Format: LMK
- License: OpenBMB
- SHA256:
a10b1aa434899ea0bd5bb5e281f622fed0b02434241d53435fce05773fa7cfa8
Replaced by: minicpm-o
Download: MiniCPM-V-2.6-Q4_K_M.lmk
Mistral Nemo Instruct 2407 12.2B (mistral-nemo
)
Description: An instruct-tuned variant developed in collaboration with NVIDIA, balancing model size with performance for conversational tasks.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: llama
- Context Length: 1024000 tokens
- Parameter Count: 12,247,782,400
- Quantization Precision: 4-bit
- File Size: 7130.82 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
579ab8f5178f5900d0c4e14534929aa0dba97e3f97be76b31ebe537ffd6cf169
Mistral Small Instruct 2501 24B (mistral-small
)
Description: Optimized for local deployment, this model balances parameter count and performance for chat and code tasks.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: llama
- Context Length: 32768 tokens
- Parameter Count: 23,572,403,200
- Quantization Precision: 4-bit
- File Size: 13669.88 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
4395b5c6136e29e9b11bdba2ee189302ad45dd5c3ef45073b729f077b8f0cec8
Replaced by: mistral-small3.1
Mistral Small 3.1 Instruct 2503 24B (mistral-small3.1
)
Description: Mistral Small 3.1 (24B) enhances Mistral Small 3 with advanced vision, 128k context, multilingual support, agentic features, and efficient local deployment.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: llama
- Context Length: 131072 tokens
- Parameter Count: 23,572,403,200
- Quantization Precision: 4-bit
- File Size: 13669.88 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
68922ff3a311c81bc4e983f86e665a12213ee84710c210522f10e65ce980bda7
Nomic embed text v1.5 (nomic-embed-text
)
Description: Provides flexible production embeddings using Matryoshka Representation Learning.
Specifications:
- Capabilities:
- Architecture: nomic-bert
- Context Length: 2048 tokens
- Parameter Count: 136,731,648
- Quantization Precision: 4-bit
- File Size: 85.86 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
1a60949a331b30bb754ad60b7bdff80d8e563a56b3f7f3f1aed68db8c143003e
Download: nomic-embed-text-1.5-Q4_K_M.gguf
Nomic embed vision v1.5 (nomic-embed-vision
)
Description: ViT-B/16-based image embedding model trained on 1.5B image-text pairs using Matryoshka Representation Learning. Outputs 768-dim embeddings aligned with Nomic Embed Text v1.5 for multimodal search, retrieval, and zero-shot classification.
Specifications:
- Capabilities:
- Architecture: ViT-B/16
- Context Length: 197 tokens
- Parameter Count: 92,384,769
- Quantization Precision: 8-bit
- File Size: 92.26 MB
- Format: ONNX
- License: apache-2.0
- SHA256:
4f6f6a765625a4b74ec3e62141b7b83e1db1fb904afeda1fa00c1fefefbcc714
Download: nomic-embed-vision-1.5-Q8.lmk
Microsoft Phi 3.5 Mini Instruct 3.8B (phi3.5
)
Description: A lightweight model optimized for reasoning-dense tasks and extended context support. Designed for efficient instruction following.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: phi3
- Context Length: 131072 tokens
- Parameter Count: 3,821,079,648
- Quantization Precision: 4-bit
- File Size: 2282.36 MB
- Format: GGUF
- License: mit
- SHA256:
782c34ae79564d1d92bd44dec233182559b3ecf6fedee44417e2a28c89bd0721
Replaced by: phi4-mini
Download: Phi-3.5-mini-Instruct-Q4_K_M.gguf
Microsoft Phi 4 Instruct 14.7B (phi4
)
Description: An enhanced generative model trained on a diverse dataset to improve instruction adherence and reasoning capabilities.
Specifications:
- Capabilities: Text Generation Chat Math
- Architecture: phi3
- Context Length: 16384 tokens
- Parameter Count: 14,659,507,200
- Quantization Precision: 4-bit
- File Size: 8633.72 MB
- Format: GGUF
- License: mit
- SHA256:
03af8f5c5a87d526047f5c20c99e32bbafd5db6dbfdee8d498d0fe1a3c45af55
Download: Phi-4-14.7B-Instruct-Q4_K_M.gguf
Microsoft Phi 4 Mini Instruct 3.8B (phi4-mini
)
Description: A lightweight open model from the Phi-4 family that uses synthetic and curated public data for reasoning-dense outputs, supports a 128K token context, and is enhanced through fine-tuning and preference optimization for precise instruction adherence and robust safety.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: phi3
- Context Length: 131072 tokens
- Parameter Count: 3,836,021,856
- Quantization Precision: 4-bit
- File Size: 2376.44 MB
- Format: GGUF
- License: mit
- SHA256:
556492e72efc8d33406b236830ad38d25669482ea7ad91fc643de237e942b9f9
Download: Phi-4-mini-Instruct-Q4_K_M.gguf
Mistral Pixtral 12B (pixtral
)
Description: Pixtral 12B is a natively multimodal model combining a 12 B parameter decoder with a 400 M vision encoder, trained on interleaved image–text data for variable image sizes, offering state-of-the-art performance in its weight class across multimodal and text-only benchmarks and supporting ultra-long 128 k sequence lengths.
Specifications:
- Capabilities: Text Generation Chat Vision
- Architecture: llama
- Context Length: 1024000 tokens
- Parameter Count: 12,682,744,832
- Quantization Precision: 4-bit
- File Size: 7572.46 MB
- Format: LMK
- License: apache-2.0
- SHA256:
28d42e60b5f33765ac6f3882abc4c7fd9f5a7955910ff117c13dbfc5aa6bf159
Download: pixtral-12B-Q4_K_M.lmk
Alibaba Qwen 2 Vision Instruct 2.2B (qwen2-vl:2b
)
Description: A multilingual vision-language model featuring dynamic resolution processing for advanced image and long-video understanding.
Specifications:
- Capabilities: Text Generation Chat Vision
- Architecture: qwen2vl
- Context Length: 32768 tokens
- Parameter Count: 2,208,985,700
- Quantization Precision: 4-bit
- File Size: 1303.99 MB
- Format: LMK
- License: apache-2.0
- SHA256:
b4e546acfd2271f5a0960b64445cae1091e5fc4192d74db72ae57c28729bd0b8
Replaced by: qwen2.5-vl:3b
Download: Qwen2-VL-2B-Instruct-Q4_K_M.lmk
Alibaba Qwen 2 Vision Instruct 8.3B (qwen2-vl:8b
)
Description: An extended variant in the Qwen 2 Vision family for multilingual vision-language tasks, including advanced video analysis.
Specifications:
- Capabilities: Text Generation Chat Vision
- Architecture: qwen2vl
- Context Length: 32768 tokens
- Parameter Count: 8,291,375,716
- Quantization Precision: 4-bit
- File Size: 4835.38 MB
- Format: LMK
- License: apache-2.0
- SHA256:
90b3eb60611559ba7521590ecccdf1d2a4dfab007566221c6a42f19b91b48686
Replaced by: qwen2.5-vl:7b
Download: Qwen2-VL-8.3B-Instruct-Q4_K_M.lmk
Alibaba Qwen 2.5 Instruct 0.5B (qwen2.5:0.5b
)
Description: A compact variant from the Alibaba Qwen 2.5 family, optimized for instruction following across chat, embeddings, and text generation tasks.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: qwen2
- Context Length: 32768 tokens
- Parameter Count: 494,032,768
- Quantization Precision: 4-bit
- File Size: 379.38 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
09b44ff0ef0a160ffe50778c0828754201bb3a40522a941839c23acfbc9ceec0
Replaced by: qwen3:0.6b
Download: Qwen-2.5-0.5B-Instruct-Q4_K_M.gguf
Alibaba Qwen 2.5 Instruct 3.1B (qwen2.5:3b
)
Description: A mid-sized model from the Alibaba Qwen 2.5 series, designed for diverse tasks including chat, embeddings, and text generation. Performance should be evaluated relative to current benchmarks.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: qwen2
- Context Length: 32768 tokens
- Parameter Count: 3,085,938,688
- Quantization Precision: 4-bit
- File Size: 1840.50 MB
- Format: GGUF
- License: qwen-research
- SHA256:
fb88cca2303e7f7d4d52679d633efe66d9c3e3555573b4444abe5ab8af4a97f7
Replaced by: qwen3:4b
Download: Qwen-2.5-3.1B-Instruct-Q4_K_M.gguf
Alibaba Qwen 2.5 Instruct 7.6B (qwen2.5:7b
)
Description: A larger variant from the Alibaba Qwen 2.5 series that supports extended context and multiple tasks including chat, embeddings, and text generation.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: qwen2
- Context Length: 32768 tokens
- Parameter Count: 7,615,616,512
- Quantization Precision: 4-bit
- File Size: 4466.13 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
2bf11b8a7d566bddfcc2b222ed7b918afc51239c5f919532de8b9403981ad866
Replaced by: qwen3:8b
Download: Qwen-2.5-7B-Instruct-Q4_K_M.gguf
Alibaba Qwen 2.5 Vision Instruct 3B (qwen2.5-vl:3b
)
Description: Qwen2.5 VL 3B Instruct is a compact vision-language chat model that delivers advanced object and text/chart understanding, agentic tool-driven interactions, long-video event localization, precise visual grounding with JSON outputs, and structured data extraction, powered by an optimized ViT encoder with dynamic temporal training.
Specifications:
- Capabilities: Text Generation Chat Vision
- Architecture: qwen2vl
- Context Length: 128000 tokens
- Parameter Count: 3,754,622,976
- Quantization Precision: 4-bit
- File Size: 2646.12 MB
- Format: LMK
- License: apache-2.0
- SHA256:
78fee4fde9f7fd93e1365cae46668184a259b1bd2a3169915a4a1e7495f859f8
Download: Qwen2.5-VL-3B-Instruct-Q4_K_M.lmk
Alibaba Qwen 2.5 Vision Instruct 7B (qwen2.5-vl:7b
)
Description: Qwen2.5 VL 7B Instruct is a next-generation vision-language chat model that combines advanced object and text/chart understanding, agentic tool use, long-video event localization, precise visual grounding with JSON outputs, and structured data extraction, all powered by a streamlined ViT encoder with dynamic temporal training.
Specifications:
- Capabilities: Text Generation Chat Vision
- Architecture: qwen2vl
- Context Length: 128000 tokens
- Parameter Count: 8,292,166,656
- Quantization Precision: 4-bit
- File Size: 5279.72 MB
- Format: LMK
- License: apache-2.0
- SHA256:
e9a99c7bb06c23bd60594cebf8a881af13f502742df3047eaa3b466c747f7453
Download: Qwen2.5-VL-7B-Instruct-Q4_K_M.lmk
Alibaba Qwen 3 Instruct 0.6B (qwen3:0.6b
)
Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: qwen3
- Context Length: 40960 tokens
- Parameter Count: 751,632,384
- Quantization Precision: 4-bit
- File Size: 461.79 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
2b1a7ed56061ad1275847412f61e8e009ada37ef865dccc25747dcc76eea9811
Download: Qwen3-0.6B-Q4_K_M.gguf
Alibaba Qwen 3 Instruct 1.7B (qwen3:1.7b
)
Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.
Specifications:
- Capabilities: Text Generation Chat
- Architecture: qwen3
- Context Length: 40960 tokens
- Parameter Count: 2,031,739,904
- Quantization Precision: 4-bit
- File Size: 1223.03 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
b047d6617eba56dcfa3357566b06807f54b15816faf6182aabd12d7e2378e537
Download: Qwen3-1.7B-Q4_K_M.gguf
Alibaba Qwen 3 Instruct 4B (qwen3:4b
)
Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: qwen3
- Context Length: 40960 tokens
- Parameter Count: 4,022,468,096
- Quantization Precision: 4-bit
- File Size: 2381.59 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
9dbc1e801f001ea316a627bb867fdd192fc3b36046fd69e160155ddc12129dbe
Download: Qwen3-4B-Q4_K_M.gguf
Alibaba Qwen 3 Instruct 8B (qwen3:8b
)
Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: qwen3
- Context Length: 40960 tokens
- Parameter Count: 8,190,735,360
- Quantization Precision: 4-bit
- File Size: 4794.87 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
b9059e3978453f50a8e9e45a825243abdb8739b2f4623e541fd5a392d9672c0f
Download: Qwen3-8B-Q4_K_M.gguf
Alibaba Qwen 3 Instruct 14B (qwen3:14b
)
Description: Qwen3 is the latest generation of Qwen large language models, combining dense and MoE architectures with seamless “thinking” vs. “non‐thinking” mode switching to deliver state-of-the-art reasoning, coding, agent integration, and instruction-following across 100+ languages.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: qwen3
- Context Length: 40960 tokens
- Parameter Count: 14,768,307,200
- Quantization Precision: 4-bit
- File Size: 8584.74 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
520369028ee99e4a3ca413a35126337038a8da561927f81322c1b34aed10e03d
Download: Qwen3-14B-Q4_K_M.gguf
Alibaba Qwen QwQ 32.5B (qwq
)
Description: QwQ is a reasoning-focused model in the Qwen series that significantly outperforms conventional instruction-tuned models on challenging tasks, with QwQ-32B demonstrating competitive performance compared to top reasoning models like DeepSeek-R1 and o1-mini.
Specifications:
- Capabilities: Text Generation Chat Code Completion Math
- Architecture: qwen2
- Context Length: 40960 tokens
- Parameter Count: 32,763,876,352
- Quantization Precision: 4-bit
- File Size: 18931.71 MB
- Format: GGUF
- License: apache-2.0
- SHA256:
6c2c72d16bbf5b0c30ac22031e0800b982b7d5c4e4d27daa62b66ee61c565d17
Download: QwQ-32B-Q4_K_M.gguf