Which Models Work Best for Function Calling and Tool Use?

TL;DR

Not all models handle tool calling equally. Function calling requires the model to generate valid JSON with correct argument names and types, which smaller and heavily quantized models often struggle with. For reliable tool use, prefer models with the FunctionCalling capability flag. Qwen 3.5 (4B+), GLM 4.7 Flash, and GPT-OSS 20B are the strongest choices in the LM-Kit.NET catalog. For agentic workflows with multi-step reasoning, use the largest model your hardware supports.

How Function Calling Works at the Model Level

When a model uses a tool, it must:

Decide whether to call a tool (vs generate text).
Select the right tool from the available list.
Generate valid JSON matching the tool's input schema exactly.
Interpret the tool result and continue reasoning.

Steps 2 and 3 are where models fail most often. Smaller models may hallucinate tool names, produce malformed JSON, or miss required fields. LM-Kit.NET corrects minor JSON formatting issues, but fundamentally better models produce fundamentally better tool calls.

Recommended Models by Use Case

Agentic Workflows (Multi-Step Tool Use)

Models that excel at ReAct-style reasoning with multiple tool calls per task:

Model	Parameters	Context	Strengths
`gptoss:20b`	20B	131K	Advanced reasoning, long context, strong JSON generation
`glm4.7-flash`	~30B (MoE)	131K	Top performance in its class, agentic tasks, math, coding
`qwen3.6:27b`	27B	128K	Excellent multilingual tool calling, thinking mode
`qwen3.5:9b`	9B	128K	Best balance of quality and speed for most hardware

Agentic Coding (File System + Web Search Tools)

Models optimized for reading, writing, and navigating code with tool calls:

Model	Parameters	Context	Strengths
`qwen3-coder:30b-a3b`	~30B (MoE, 3.3B active)	262K	Purpose-built for agentic coding, 128 experts, native tool calling
`devstral-small2`	24B	393K	Top SWE-bench scores among open models under 30B, vision capable
`qwen3.5:9b`	9B	128K	Strong code generation with general-purpose versatility
`gptoss:20b`	20B	131K	Advanced reasoning with long context for large codebases

→ Try it: Code Analysis Assistant · Code Writing Assistant

Simple Tool Calling (1-2 Tools, Straightforward Tasks)

Model	Parameters	Context	Strengths
`qwen3.5:4b`	4B	128K	Good tool calling for its size, fast on GPU
`phi4-mini:3.8b`	3.8B	16K	Compact, efficient, solid JSON generation
`gemma4:e4b`	4B	128K	Strong instruction following

Not Recommended for Tool Calling

Model Size	Why
Under 3B parameters	Unreliable JSON generation, frequent hallucination of tool names
Heavy quantization (Q2, Q3)	Degrades JSON structure accuracy significantly
Models without `FunctionCalling` flag	Not trained or optimized for structured tool output

The FunctionCalling Capability Flag

Models in the LM-Kit.NET catalog are tagged with capability flags. Check for FunctionCalling to confirm a model supports tool use:

using LMKit.Model;

var card = ModelCard.GetByModelID("qwen3.5:9b");

if (card.Capabilities.HasFlag(ModelCapabilities.FunctionCalling))
    Console.WriteLine("This model supports function calling");

Models with this flag have been tested for reliable JSON schema adherence and multi-turn tool interactions.

Tips for Reliable Tool Calling

1. Use the Largest Model Your Hardware Allows

Bigger models produce more reliable tool calls. If you can run 8B, prefer it over 4B for agentic tasks.

2. Keep Tool Definitions Concise

Tool descriptions and parameter schemas consume context tokens. Concise, clear definitions help the model understand the tools better. Avoid registering tools the agent does not need.

3. Use Q4_K_M Quantization

The Q4_K_M quantization (default in the LM-Kit.NET catalog) provides the best balance of quality and size. Avoid Q2 or Q3 quantizations for tool-calling agents.

4. Limit the Number of Registered Tools

Models perform better with fewer, well-defined tools than with a large toolbox. Register only the tools relevant to the current task.

5. Set MaxIterations Appropriately

Prevent runaway tool-calling loops:

var agent = Agent.CreateBuilder(model)
    .WithTools(tools => { /* ... */ })
    .WithMaxIterations(10)   // Default: 10. Increase for complex tasks.
    .Build();

What is function calling and tool use in LM-Kit.NET?: How function calling works with ITool and LMFunction.
How do I prevent an agent from misusing tools?: Permission policies for production safety.
How do I choose the right model size for my hardware?: Memory estimation and hardware matching.
Model Catalog: Browse all models with capability flags and hardware recommendations.
Code Analysis Assistant Demo: Read-only code assistant using FileSystemRead, FileSystemList, FileSystemSearch, and WebSearch.
Code Writing Assistant Demo: Code assistant with file write capabilities using five built-in tools.

Table of Contents