This page describes every model family in the LM-Kit catalog, organized by category. Use it as a reference when comparing models or exploring alternatives beyond the recommended starting points.
OpenAI open-weight. Near o3-mini on reasoning benchmarks (96% AIME 2024). Configurable reasoning effort. Strong agentic and tool-use capabilities. Runs on 16 GB VRAM thanks to MoE efficiency.
GLM 4.7
30B (MoE, ~3B active)
Z.ai. Leads the 30B class on coding and agentic benchmarks (59% SWE-bench Verified, 79% Tau2-Bench). Strong math (92% AIME 2025). Interleaved thinking preserves reasoning context across tool calls. 200K context.
Gemma 3
270M, 1B, 4B, 12B, 27B
Google. Versatile all-rounder with vision support (4B+). The 27B is among the highest-rated open models on LMArena. 128K context. Excellent quality-to-size ratio across the full range.
Qwen 3
0.6B, 1.7B, 4B, 8B, 14B
Alibaba. Dual-mode thinking (reasoning on/off in one model). 119 languages, strong math and tool calling. Each size matches a Qwen 2.5 model roughly 2x larger.
Falcon H1R
7B
TII. Hybrid Transformer + Mamba2 reasoning model. 88% on AIME 2024, outperforming many models up to 7x its size on math benchmarks. Exceptional inference speed (~1,500 tok/sec/GPU). 256K context.
Falcon 3
3B, 7B, 10B
TII. Open-weight dense models. Solid general-purpose chat with math and code.
Llama 3
1B, 3B, 8B, 70B
Meta. Well-rounded, large community. 131K context. Tool calling on 8B (3.1) and 70B (3.3).
Phi 4
3.8B (Mini), 14.7B
Microsoft. Compact and efficient. Strong for its size class, good tool calling support.
QwQ
32.5B
Alibaba. Dedicated reasoning model with math, coding, and tool calling. 40K context.
Nemotron 3 Nano
30B (MoE, ~3.5B active)
NVIDIA. Hybrid Mamba-2/Transformer reasoning model. 1M context. Strong on math and coding.
SmolLM3
3B
HuggingFace. Lightweight, math and code capable. 65K context.
Code Generation Models
Family
Sizes
Strengths
Devstral
24B
Mistral. Purpose-built for agentic software engineering. 68% on SWE-bench Verified, the highest among open models under 30B. 393K context. Vision capable.
DeepSeek Coder
16B
Specialized code generation. 163K context.
DeepSeek R1
8B (distilled)
Code and math reasoning. Distilled from the full R1 model.
Mistral Family (Chat, Vision, Reasoning)
Family
Sizes
Strengths
Ministral 3
3B, 8B, 14B
Edge-optimized with vision and tool calling. 262K context. Great for on-device deployment.
Mistral Small 3.2
24B
Strong tool calling and code. 131K context.
Magistral Small
24B
Reasoning specialist with transparent chain-of-thought. Tool calling support.
Pixtral
12B
Vision-language model. 1M context.
Vision / Multimodal Models
Family
Sizes
Strengths
Qwen 3 VL
2B, 4B, 8B, 30B
Vision-language with tool calling, code, and math. 262K context. The 30B is an MoE variant.
MiniCPM
8B, 9B
OpenBMB. Compact vision models. MiniCPM-o 4.5 is the latest with strong visual understanding.
LightOnOCR
1B
Specialized for OCR tasks. Lightweight.
Enterprise and Long-Context Models
Family
Sizes
Strengths
Granite 4 Hybrid
3B, 7B (MoE)
IBM. Hybrid Mamba-2/Transformer. Up to 1M token context with 70% less memory than standard transformers. ISO 42001 certified. Strong instruction following and function calling.
Embedding Models
Family
Sizes
Strengths
Embedding Gemma
300M
Google. Derived from Gemma. Highest-ranked open model under 500M params on MTEB. Excellent default for lightweight RAG. 2K context.
Qwen 3 Embedding
0.6B, 4B, 8B
Top 3 on MTEB multilingual (8B variant). 32K/40K context. Best overall accuracy for RAG, especially multilingual and code.
Mathematical reasoning (competition-level math problems)
Results published per model release
Tip: Benchmark scores are self-reported by model authors and may use different evaluation settings. Cross-reference multiple leaderboards, and always test on your own data before choosing a model for production.