Which Models Work Best for Function Calling and Tool Use?
TL;DR
Not all models handle tool calling equally. Function calling requires the model to generate valid JSON with correct argument names and types, which smaller and heavily quantized models often struggle with. For reliable tool use, prefer models with the FunctionCalling capability flag. Qwen 3.5 (4B+), GLM 4.7 Flash, and GPT-OSS 20B are the strongest choices in the LM-Kit.NET catalog. For agentic workflows with multi-step reasoning, use the largest model your hardware supports.
How Function Calling Works at the Model Level
When a model uses a tool, it must:
- Decide whether to call a tool (vs generate text).
- Select the right tool from the available list.
- Generate valid JSON matching the tool's input schema exactly.
- Interpret the tool result and continue reasoning.
Steps 2 and 3 are where models fail most often. Smaller models may hallucinate tool names, produce malformed JSON, or miss required fields. LM-Kit.NET corrects minor JSON formatting issues, but fundamentally better models produce fundamentally better tool calls.
Recommended Models by Use Case
Agentic Workflows (Multi-Step Tool Use)
Models that excel at ReAct-style reasoning with multiple tool calls per task:
| Model | Parameters | Context | Strengths |
|---|---|---|---|
gptoss:20b |
20B | 131K | Advanced reasoning, long context, strong JSON generation |
glm4.7-flash |
~30B (MoE) | 131K | Top performance in its class, agentic tasks, math, coding |
qwen3.5:27b |
27B | 128K | Excellent multilingual tool calling, thinking mode |
qwen3.5:9b |
9B | 128K | Best balance of quality and speed for most hardware |
Agentic Coding (File System + Web Search Tools)
Models optimized for reading, writing, and navigating code with tool calls:
| Model | Parameters | Context | Strengths |
|---|---|---|---|
qwen3-coder:30b-a3b |
~30B (MoE, 3.3B active) | 262K | Purpose-built for agentic coding, 128 experts, native tool calling |
devstral-small2 |
24B | 393K | Top SWE-bench scores among open models under 30B, vision capable |
qwen3.5:9b |
9B | 128K | Strong code generation with general-purpose versatility |
gptoss:20b |
20B | 131K | Advanced reasoning with long context for large codebases |
โ Try it: Code Analysis Assistant ยท Code Writing Assistant
Simple Tool Calling (1-2 Tools, Straightforward Tasks)
| Model | Parameters | Context | Strengths |
|---|---|---|---|
qwen3.5:4b |
4B | 128K | Good tool calling for its size, fast on GPU |
phi4-mini:3.8b |
3.8B | 16K | Compact, efficient, solid JSON generation |
gemma4:e4b |
4B | 128K | Strong instruction following |
Not Recommended for Tool Calling
| Model Size | Why |
|---|---|
| Under 3B parameters | Unreliable JSON generation, frequent hallucination of tool names |
| Heavy quantization (Q2, Q3) | Degrades JSON structure accuracy significantly |
Models without FunctionCalling flag |
Not trained or optimized for structured tool output |
The FunctionCalling Capability Flag
Models in the LM-Kit.NET catalog are tagged with capability flags. Check for FunctionCalling to confirm a model supports tool use:
using LMKit.Model;
var card = ModelCard.GetByModelID("qwen3.5:9b");
if (card.Capabilities.HasFlag(ModelCapabilities.FunctionCalling))
Console.WriteLine("This model supports function calling");
Models with this flag have been tested for reliable JSON schema adherence and multi-turn tool interactions.
Tips for Reliable Tool Calling
1. Use the Largest Model Your Hardware Allows
Bigger models produce more reliable tool calls. If you can run 8B, prefer it over 4B for agentic tasks.
2. Keep Tool Definitions Concise
Tool descriptions and parameter schemas consume context tokens. Concise, clear definitions help the model understand the tools better. Avoid registering tools the agent does not need.
3. Use Q4_K_M Quantization
The Q4_K_M quantization (default in the LM-Kit.NET catalog) provides the best balance of quality and size. Avoid Q2 or Q3 quantizations for tool-calling agents.
4. Limit the Number of Registered Tools
Models perform better with fewer, well-defined tools than with a large toolbox. Register only the tools relevant to the current task.
5. Set MaxIterations Appropriately
Prevent runaway tool-calling loops:
var agent = Agent.CreateBuilder(model)
.WithTools(tools => { /* ... */ })
.WithMaxIterations(10) // Default: 10. Increase for complex tasks.
.Build();
๐ Related Content
- What is function calling and tool use in LM-Kit.NET?: How function calling works with ITool and LMFunction.
- How do I prevent an agent from misusing tools?: Permission policies for production safety.
- How do I choose the right model size for my hardware?: Memory estimation and hardware matching.
- Model Catalog: Browse all models with capability flags and hardware recommendations.
- Code Analysis Assistant Demo: Read-only code assistant using FileSystemRead, FileSystemList, FileSystemSearch, and WebSearch.
- Code Writing Assistant Demo: Code assistant with file write capabilities using five built-in tools.