Table of Contents
Loading Models
A model can be loaded by its Model ID using the following code:
var model = LM.LoadFromModelID("gemma3:4b"); // Load the Gemma 3 model (4 billion parameters) by its unique Model ID

LM-Kit Model Catalog


Models

Model & ID Capabilities Context Parameters Format License Download Details
BAAI bge small en v1.5
id: bge-small
Text Embeddings 512 0.03 B GGUF mit bge-small-en-v1.5-f16.gguf details
DeepSeek Coder V2 Lite
id: deepseek-coder-v2:16b
Code Completion 163840 15.71 B GGUF deepseek DeepSeek-Coder-2-Lite-15.7B-Instruct-Q4_K_M.gguf details
DeepSeek R1 Distill Llama
id: deepseek-r1:8b
Text Embeddings Text Generation Chat Code Completion Math 131072 8.03 B GGUF mit DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf details
TII Falcon 3 Instruct
id: falcon3:3b
Text Embeddings Text Generation Chat Code Completion Math 32768 3.23 B GGUF falcon-llm-license Falcon3-3B-Instruct-q4_k_m.gguf details
TII Falcon 3 Instruct
id: falcon3:7b
Text Embeddings Text Generation Chat Code Completion Math 32768 7.62 B GGUF falcon-llm-license Falcon-3-7.6B-Instruct-Q4_K_M.gguf details
TII Falcon 3 Instruct
id: falcon3:10b
Text Embeddings Text Generation Chat Code Completion Math 32768 10.31 B GGUF falcon-llm-license Falcon3-10B-Instruct-q4_k_m.gguf details
Google Gemma 3
id: gemma3:1b
Text Embeddings Text Generation Chat 32768 1.00 B GGUF gemma gemma-3-it-1B-Q4_K_M.gguf details
Google Gemma 3
id: gemma3:4b
Text Embeddings Text Generation Chat Code Completion Math Vision 131072 3.88 B GGUF gemma gemma-3-4b-it-Q4_K_M.lmk details
Google Gemma 3
id: gemma3:12b
Text Embeddings Text Generation Chat Code Completion Math Vision 131072 11.77 B GGUF gemma gemma-3-12b-it-Q4_K_M.lmk details
Google Gemma 3
id: gemma3:27b
Text Embeddings Text Generation Chat Code Completion Math Vision 131072 27.01 B GGUF gemma gemma-3-27b-it-Q4_K_M.lmk details
IBM Granite 3.3 Instruct
id: granite3.3:2b
Text Embeddings Text Generation Chat Code Completion 131072 2.53 B GGUF apache-2.0 granite-3.3-8B-Instruct-Q4_K_M.gguf details
IBM Granite 3.3 Instruct
id: granite3.3:8b
Text Embeddings Text Generation Chat Code Completion 131072 8.17 B GGUF apache-2.0 granite-3.3-2B-Instruct-Q4_K_M.gguf details
Meta Llama 3.1 Instruct
id: llama3.1
Text Embeddings Text Generation Chat 131072 8.03 B GGUF llama3.1 Llama-3.1-8B-Instruct-Q4_K_M.gguf details
Meta Llama 3.2 Instruct
id: llama3.2:1b
Text Embeddings Text Generation Chat 131072 1.24 B GGUF llama3.2 Llama-3.2-1B-Instruct-Q4_K_M.gguf details
Meta Llama 3.2 Instruct
id: llama3.2:3b
Text Embeddings Text Generation Chat 131072 3.21 B GGUF llama3.2 Llama-3.2-3B-Instruct-Q4_K_M.gguf details
Meta Llama 3.3 Instruct
id: llama3.3
Text Embeddings Text Generation Chat Code Completion Math 131072 70.55 B GGUF llama3.3 Llama-3.3-70B-Instruct-Q4_K_M.gguf details
LM-Kit Sarcasm Detection V1
id: lmkit-sarcasm-detection
Sentiment Analysis 2048 1.10 B GGUF lm-kit LM-Kit.Sarcasm_Detection-TinyLlama-1.1B-1T-OpenOrca-en-q4.gguf details
LM-Kit Sentiment Analysis V2
id: lmkit-sentiment-analysis
Sentiment Analysis 131072 1.24 B GGUF lm-kit lm-kit-sentiment-analysis-2.0-1b-q4.gguf details
OpenBMB MiniCPM o 2.6 Vision
id: minicpm-o
Text Embeddings Text Generation Chat Vision 32768 8.12 B LMK OpenBMB MiniCPM-o-V-2.6-Q4_K_M.lmk details
Mistral Nemo Instruct 2407
id: mistral-nemo
Text Embeddings Text Generation Chat 1024000 12.25 B GGUF apache-2.0 Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf details
Mistral Small 3.1 Instruct 2503
id: mistral-small3.1
Text Embeddings Text Generation Chat Code Completion Math 131072 23.57 B GGUF apache-2.0 Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf details
Nomic embed text v1.5
id: nomic-embed-text
Text Embeddings 2048 0.14 B GGUF apache-2.0 nomic-embed-text-1.5-Q4_K_M.gguf details
Nomic embed vision v1.5
id: nomic-embed-vision
Image Embeddings 197 0.09 B ONNX apache-2.0 nomic-embed-vision-1.5-Q8.lmk details
Microsoft Phi 4 Instruct
id: phi4
Text Embeddings Text Generation Chat Math 16384 14.66 B GGUF mit Phi-4-14.7B-Instruct-Q4_K_M.gguf details
Microsoft Phi 4 Mini Instruct
id: phi4-mini
Text Embeddings Text Generation Chat 131072 3.84 B GGUF mit Phi-4-mini-Instruct-Q4_K_M.gguf details
Alibaba Qwen 2 Vision
id: qwen2-vl:2b
Text Embeddings Text Generation Chat Vision 32768 2.21 B LMK apache-2.0 Qwen2-VL-2B-Instruct-Q4_K_M.lmk details
Alibaba Qwen 2 Vision
id: qwen2-vl:8b
Text Embeddings Text Generation Chat Vision 32768 8.29 B LMK apache-2.0 Qwen2-VL-8.3B-Instruct-Q4_K_M.lmk details
Alibaba Qwen 2.5 Instruct
id: qwen2.5:0.5b
Text Embeddings Text Generation Chat 32768 0.49 B GGUF apache-2.0 Qwen-2.5-0.5B-Instruct-Q4_K_M.gguf details
Alibaba Qwen 2.5 Instruct
id: qwen2.5:3b
Text Embeddings Text Generation Chat 32768 3.09 B GGUF qwen-research Qwen-2.5-3.1B-Instruct-Q4_K_M.gguf details
Alibaba Qwen 2.5 Instruct
id: qwen2.5:7b
Text Embeddings Text Generation Chat 32768 7.62 B GGUF apache-2.0 Qwen-2.5-7B-Instruct-Q4_K_M.gguf details
Alibaba Qwen QwQ
id: qwq
Text Embeddings Text Generation Chat Code Completion Math 40960 32.76 B GGUF apache-2.0 QwQ-32B-Q4_K_M.gguf details


Model Details

BAAI bge small en v1.5 (bge-small)

Description:

An efficient, CPU-friendly English embedding model (BAAI General Embedding) designed for lightweight applications.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: bert
  • Context Length: 512 tokens
  • Parameter Count: 33,212,160
  • Quantization Precision: 16-bit
  • File Size: 64.45 MB
  • Format: GGUF
  • License: mit
  • SHA256: cd5790da23df71e7e20fe20bb523bd4586a533a4ee813cc562e32b37929141c1

Download:
bge-small-en-v1.5-f16.gguf

DeepSeek Coder V2 Lite 15.7B (deepseek-coder-v2:16b)

Description:

An open-source mixture-of-experts code model tailored for code completion tasks. Early evaluations indicated competitive performance relative to leading code models.

Specifications:

  • Capabilities: Code Completion
  • Architecture: deepseek2
  • Context Length: 163840 tokens
  • Parameter Count: 15,706,484,224
  • Quantization Precision: 4-bit
  • File Size: 9884.28 MB
  • Format: GGUF
  • License: deepseek
  • SHA256: ac398e8c1c670d3c362d3c1182614916bab7c364708ec073fcf947f6802d509e

Download:
DeepSeek-Coder-2-Lite-15.7B-Instruct-Q4_K_M.gguf

DeepSeek R1 Distill Llama 8B (deepseek-r1:8b)

Description:

DeepSeek-R1 enhances its predecessor by integrating cold-start data to overcome repetition and readability issues, achieving state-of-the-art performance in math, code, and reasoning tasks, with all models open-sourced.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 131072 tokens
  • Parameter Count: 8,030,261,312
  • Quantization Precision: 4-bit
  • File Size: 4692.78 MB
  • Format: GGUF
  • License: mit
  • SHA256: 596fce705423e44831fe63367a30ccc7b36921c1bfdd4b9dfde85a5aa97ac2ef

Download:
DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf

TII Falcon 3 Instruct 3.2B (falcon3:3b)

Description:

Designed for multilingual tasks including chat, text generation, and code completion, supporting extended context lengths.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 32768 tokens
  • Parameter Count: 3,227,655,168
  • Quantization Precision: 4-bit
  • File Size: 1912.77 MB
  • Format: GGUF
  • License: falcon-llm-license
  • SHA256: 81c6b52d221c2f0eea3db172fc74de28534f2fd15f198ecbfcc55577d20cbf8a

Download:
Falcon3-3B-Instruct-q4_k_m.gguf

TII Falcon 3 Instruct 7.6B (falcon3:7b)

Description:

Offers robust performance across chat, text generation, and mathematical reasoning tasks with extended context support.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 32768 tokens
  • Parameter Count: 7,615,616,512
  • Quantization Precision: 4-bit
  • File Size: 4358.03 MB
  • Format: GGUF
  • License: falcon-llm-license
  • SHA256: 4ce1da546d76e04ce77eb076556eb25e1096faf6155ee429245e4bfa3f5ddf5d

Download:
Falcon-3-7.6B-Instruct-Q4_K_M.gguf

TII Falcon 3 Instruct 10.3B (falcon3:10b)

Description:

A larger variant tailored for multilingual dialogue, code completion, and complex reasoning tasks with extended context support.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 32768 tokens
  • Parameter Count: 10,305,653,760
  • Quantization Precision: 4-bit
  • File Size: 5996.25 MB
  • Format: GGUF
  • License: falcon-llm-license
  • SHA256: a0c0edbd35019ff26d972a0373b25b4c8d72315395a3b6036aca5e6bafa3d819

Download:
Falcon3-10B-Instruct-q4_k_m.gguf

Google Gemma 3 1B (gemma3:1b)

Description:

Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: gemma3
  • Context Length: 32768 tokens
  • Parameter Count: 999,885,952
  • Quantization Precision: 4-bit
  • File Size: 768.72 MB
  • Format: GGUF
  • License: gemma
  • SHA256: bacfe3de6eee9fba412d5c0415630172c2a602dae26bb353e1b20dd67194a226

Download:
gemma-3-it-1B-Q4_K_M.gguf

Google Gemma 3 3.9B (gemma3:4b)

Description:

Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math Vision
  • Architecture: gemma3
  • Context Length: 131072 tokens
  • Parameter Count: 3,880,099,328
  • Quantization Precision: 4-bit
  • File Size: 2938.40 MB
  • Format: GGUF
  • License: gemma
  • SHA256: abb283e96c0abf58468a18127ce6e8b2bfc98e48f1ec618f658495c09254bdae

Download:
gemma-3-4b-it-Q4_K_M.lmk

Google Gemma 3 11.8B (gemma3:12b)

Description:

Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math Vision
  • Architecture: gemma3
  • Context Length: 131072 tokens
  • Parameter Count: 11,765,788,416
  • Quantization Precision: 4-bit
  • File Size: 7529.17 MB
  • Format: GGUF
  • License: gemma
  • SHA256: d6f01cdb4369769ea87c5211a7fd865e12dbb9e2a937b43ef281a5b7e9ba2e35

Download:
gemma-3-12b-it-Q4_K_M.lmk

Google Gemma 3 27.2B (gemma3:27b)

Description:

Gemma is Google's lightweight, multimodal, open AI model family based on Gemini technology, supporting text and image inputs, 128K context windows, multilingual capabilities in over 140 languages, and optimized for resource-limited environments.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math Vision
  • Architecture: gemma3
  • Context Length: 131072 tokens
  • Parameter Count: 27,009,002,240
  • Quantization Precision: 4-bit
  • File Size: 16350.05 MB
  • Format: GGUF
  • License: gemma
  • SHA256: 2d0e4382259ae2da28b9c0342e982a58eafbddad7c05bbfe6e104f2b3c165994

Download:
gemma-3-27b-it-Q4_K_M.lmk

IBM Granite 3.3 Instruct 2.5B (granite3.3:2b)

Description:

A long-context instruct model finetuned with a mix of open source and synthetic datasets. Designed for dialogue and text generation tasks.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion
  • Architecture: granite
  • Context Length: 131072 tokens
  • Parameter Count: 2,533,539,840
  • Quantization Precision: 4-bit
  • File Size: 1473.72 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: dbe4dd51bd6c1e39f96c831bf086454c9b313bd1c279ebb7166f2a37d86598da

Download:
granite-3.3-8B-Instruct-Q4_K_M.gguf

IBM Granite 3.3 Instruct 8.2B (granite3.3:8b)

Description:

An extended-context model optimized for dialogue and code completion tasks. Developed with diverse training data to enhance long-context understanding.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion
  • Architecture: granite
  • Context Length: 131072 tokens
  • Parameter Count: 8,170,864,640
  • Quantization Precision: 4-bit
  • File Size: 4713.89 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 1c890e740d7ecb010716a858eda315c01ac5bb0edfaf68bf17118868a26bb8ff

Download:
granite-3.3-2B-Instruct-Q4_K_M.gguf

Meta Llama 3.1 Instruct 8B (llama3.1)

Description:

A multilingual generative model optimized for dialogue and text generation tasks. Designed for robust performance on common benchmarks.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: llama
  • Context Length: 131072 tokens
  • Parameter Count: 8,030,261,312
  • Quantization Precision: 4-bit
  • File Size: 4692.78 MB
  • Format: GGUF
  • License: llama3.1
  • SHA256: ad00fe50a62d1e009b4e06cd57ab55c9a30cbf5e7f183de09115d75ada73bd5b

Download:
Llama-3.1-8B-Instruct-Q4_K_M.gguf

Meta Llama 3.2 Instruct 1.2B (llama3.2:1b)

Description:

A multilingual instruct-tuned model optimized for dialogue, retrieval, and summarization tasks.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: llama
  • Context Length: 131072 tokens
  • Parameter Count: 1,235,814,432
  • Quantization Precision: 4-bit
  • File Size: 770.28 MB
  • Format: GGUF
  • License: llama3.2
  • SHA256: 88725e821cf35f1a0dbeaa4a3bebeb91e6c6b6a9d50f808ab42d64233284cce1

Download:
Llama-3.2-1B-Instruct-Q4_K_M.gguf

Meta Llama 3.2 Instruct 3.2B (llama3.2:3b)

Description:

A multilingual dialogue model with robust text generation and summarization capabilities.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: llama
  • Context Length: 131072 tokens
  • Parameter Count: 3,212,749,888
  • Quantization Precision: 4-bit
  • File Size: 1925.83 MB
  • Format: GGUF
  • License: llama3.2
  • SHA256: 6810bf3cce69d440a22b85a3b3e28f57c868f1c98686abd995f1dc5d9b955cfe

Download:
Llama-3.2-3B-Instruct-Q4_K_M.gguf

Meta Llama 3.3 Instruct 70.6B (llama3.3)

Description:

A large multilingual generative model optimized for dialogue, text tasks, code completion, and mathematical reasoning with extended context support.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 131072 tokens
  • Parameter Count: 70,553,706,560
  • Quantization Precision: 4-bit
  • File Size: 40550.61 MB
  • Format: GGUF
  • License: llama3.3
  • SHA256: 57f78fe3b141afa56406278265656524c51c9837edb3537ad43708b6d4ecc04d

Download:
Llama-3.3-70B-Instruct-Q4_K_M.gguf

LM-Kit Sarcasm Detection V1 1.1B (lmkit-sarcasm-detection)

Description:

Optimized for detecting sarcasm in English text within the LM-Kit framework. Suitable for CPU-based inference.

Specifications:

  • Capabilities: Sentiment Analysis
  • Architecture: llama
  • Context Length: 2048 tokens
  • Parameter Count: 1,100,048,384
  • Quantization Precision: 4-bit
  • File Size: 636.88 MB
  • Format: GGUF
  • License: lm-kit
  • SHA256: cc82abd224dba9c689b19d368db6078d6167ca84897b21870d7d6a2c0f09d7d0

Download:
LM-Kit.Sarcasm_Detection-TinyLlama-1.1B-1T-OpenOrca-en-q4.gguf

LM-Kit Sentiment Analysis V2 1.2B (lmkit-sentiment-analysis)

Description:

Designed for multilingual sentiment analysis tasks, this LM-Kit model is optimized for efficient CPU-based inference.

Specifications:

  • Capabilities: Sentiment Analysis
  • Architecture: llama
  • Context Length: 131072 tokens
  • Parameter Count: 1,235,814,432
  • Quantization Precision: 4-bit
  • File Size: 770.28 MB
  • Format: GGUF
  • License: lm-kit
  • SHA256: e12f4abf6453a8431985ce1d6350c265cd58b25210156a917e3608c850fd7add

Download:
lm-kit-sentiment-analysis-2.0-1b-q4.gguf

OpenBMB MiniCPM o 2.6 Vision 8.1B (minicpm-o)

Description:

An end-to-end multimodal model supporting real-time speech, image, and text understanding. Offers enhanced performance for conversational tasks.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Vision
  • Architecture: qwen2
  • Context Length: 32768 tokens
  • Parameter Count: 8,116,736,752
  • Quantization Precision: 4-bit
  • File Size: 5120.87 MB
  • Format: LMK
  • License: OpenBMB
  • SHA256: 6fd17ed1f46bfcddb5a3482dd882dd022a46aa8c33cb93d75f809cd4d118ab53

Download:
MiniCPM-o-V-2.6-Q4_K_M.lmk

Mistral Nemo Instruct 2407 12.2B (mistral-nemo)

Description:

An instruct-tuned variant developed in collaboration with NVIDIA, balancing model size with performance for conversational tasks.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: llama
  • Context Length: 1024000 tokens
  • Parameter Count: 12,247,782,400
  • Quantization Precision: 4-bit
  • File Size: 7130.82 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 579ab8f5178f5900d0c4e14534929aa0dba97e3f97be76b31ebe537ffd6cf169

Download:
Mistral-Nemo-2407-12.2B-Instruct-Q4_K_M.gguf

Mistral Small 3.1 Instruct 2503 24B (mistral-small3.1)

Description:

Mistral Small 3.1 (24B) enhances Mistral Small 3 with advanced vision, 128k context, multilingual support, agentic features, and efficient local deployment.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math
  • Architecture: llama
  • Context Length: 131072 tokens
  • Parameter Count: 23,572,403,200
  • Quantization Precision: 4-bit
  • File Size: 13669.88 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 68922ff3a311c81bc4e983f86e665a12213ee84710c210522f10e65ce980bda7

Download:
Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf

Nomic embed text v1.5 (nomic-embed-text)

Description:

Provides flexible production embeddings using Matryoshka Representation Learning.

Specifications:

  • Capabilities: Text Embeddings
  • Architecture: nomic-bert
  • Context Length: 2048 tokens
  • Parameter Count: 136,731,648
  • Quantization Precision: 4-bit
  • File Size: 85.86 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 1a60949a331b30bb754ad60b7bdff80d8e563a56b3f7f3f1aed68db8c143003e

Download:
nomic-embed-text-1.5-Q4_K_M.gguf

Nomic embed vision v1.5 (nomic-embed-vision)

Description:

ViT-B/16-based image embedding model trained on 1.5B image-text pairs using Matryoshka Representation Learning. Outputs 768-dim embeddings aligned with Nomic Embed Text v1.5 for multimodal search, retrieval, and zero-shot classification.

Specifications:

  • Capabilities: Image Embeddings
  • Architecture: ViT-B/16
  • Context Length: 197 tokens
  • Parameter Count: 92,384,769
  • Quantization Precision: 8-bit
  • File Size: 92.26 MB
  • Format: ONNX
  • License: apache-2.0
  • SHA256: 4f6f6a765625a4b74ec3e62141b7b83e1db1fb904afeda1fa00c1fefefbcc714

Download:
nomic-embed-vision-1.5-Q8.lmk

Microsoft Phi 4 Instruct 14.7B (phi4)

Description:

An enhanced generative model trained on a diverse dataset to improve instruction adherence and reasoning capabilities.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Math
  • Architecture: phi3
  • Context Length: 16384 tokens
  • Parameter Count: 14,659,507,200
  • Quantization Precision: 4-bit
  • File Size: 8633.72 MB
  • Format: GGUF
  • License: mit
  • SHA256: 03af8f5c5a87d526047f5c20c99e32bbafd5db6dbfdee8d498d0fe1a3c45af55

Download:
Phi-4-14.7B-Instruct-Q4_K_M.gguf

Microsoft Phi 4 Mini Instruct 3.8B (phi4-mini)

Description:

A lightweight open model from the Phi-4 family that uses synthetic and curated public data for reasoning-dense outputs, supports a 128K token context, and is enhanced through fine-tuning and preference optimization for precise instruction adherence and robust safety.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: phi3
  • Context Length: 131072 tokens
  • Parameter Count: 3,836,021,856
  • Quantization Precision: 4-bit
  • File Size: 2376.44 MB
  • Format: GGUF
  • License: mit
  • SHA256: 556492e72efc8d33406b236830ad38d25669482ea7ad91fc643de237e942b9f9

Download:
Phi-4-mini-Instruct-Q4_K_M.gguf

Alibaba Qwen 2 Vision 2.2B (qwen2-vl:2b)

Description:

A multilingual vision-language model featuring dynamic resolution processing for advanced image and long-video understanding.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Vision
  • Architecture: qwen2vl
  • Context Length: 32768 tokens
  • Parameter Count: 2,208,985,700
  • Quantization Precision: 4-bit
  • File Size: 1303.99 MB
  • Format: LMK
  • License: apache-2.0
  • SHA256: b4e546acfd2271f5a0960b64445cae1091e5fc4192d74db72ae57c28729bd0b8

Download:
Qwen2-VL-2B-Instruct-Q4_K_M.lmk

Alibaba Qwen 2 Vision 8.3B (qwen2-vl:8b)

Description:

An extended variant in the Qwen 2 Vision family for multilingual vision-language tasks, including advanced video analysis.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Vision
  • Architecture: qwen2vl
  • Context Length: 32768 tokens
  • Parameter Count: 8,291,375,716
  • Quantization Precision: 4-bit
  • File Size: 4835.38 MB
  • Format: LMK
  • License: apache-2.0
  • SHA256: 90b3eb60611559ba7521590ecccdf1d2a4dfab007566221c6a42f19b91b48686

Download:
Qwen2-VL-8.3B-Instruct-Q4_K_M.lmk

Alibaba Qwen 2.5 Instruct 0.5B (qwen2.5:0.5b)

Description:

A compact variant from the Alibaba Qwen 2.5 family, optimized for instruction following across chat, embeddings, and text generation tasks.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: qwen2
  • Context Length: 32768 tokens
  • Parameter Count: 494,032,768
  • Quantization Precision: 4-bit
  • File Size: 379.38 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 09b44ff0ef0a160ffe50778c0828754201bb3a40522a941839c23acfbc9ceec0

Download:
Qwen-2.5-0.5B-Instruct-Q4_K_M.gguf

Alibaba Qwen 2.5 Instruct 3.1B (qwen2.5:3b)

Description:

A mid-sized model from the Alibaba Qwen 2.5 series, designed for diverse tasks including chat, embeddings, and text generation. Performance should be evaluated relative to current benchmarks.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: qwen2
  • Context Length: 32768 tokens
  • Parameter Count: 3,085,938,688
  • Quantization Precision: 4-bit
  • File Size: 1840.50 MB
  • Format: GGUF
  • License: qwen-research
  • SHA256: fb88cca2303e7f7d4d52679d633efe66d9c3e3555573b4444abe5ab8af4a97f7

Download:
Qwen-2.5-3.1B-Instruct-Q4_K_M.gguf

Alibaba Qwen 2.5 Instruct 7.6B (qwen2.5:7b)

Description:

A larger variant from the Alibaba Qwen 2.5 series that supports extended context and multiple tasks including chat, embeddings, and text generation.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat
  • Architecture: qwen2
  • Context Length: 32768 tokens
  • Parameter Count: 7,615,616,512
  • Quantization Precision: 4-bit
  • File Size: 4466.13 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 2bf11b8a7d566bddfcc2b222ed7b918afc51239c5f919532de8b9403981ad866

Download:
Qwen-2.5-7B-Instruct-Q4_K_M.gguf

Alibaba Qwen QwQ 32.5B (qwq)

Description:

QwQ is a reasoning-focused model in the Qwen series that significantly outperforms conventional instruction-tuned models on challenging tasks, with QwQ-32B demonstrating competitive performance compared to top reasoning models like DeepSeek-R1 and o1-mini.

Specifications:

  • Capabilities: Text Embeddings Text Generation Chat Code Completion Math
  • Architecture: qwen2
  • Context Length: 40960 tokens
  • Parameter Count: 32,763,876,352
  • Quantization Precision: 4-bit
  • File Size: 18931.71 MB
  • Format: GGUF
  • License: apache-2.0
  • SHA256: 6c2c72d16bbf5b0c30ac22031e0800b982b7d5c4e4d27daa62b66ee61c565d17

Download:
QwQ-32B-Q4_K_M.gguf