Table of Contents

Which Models Work Best for Function Calling and Tool Use?


TL;DR

Not all models handle tool calling equally. Function calling requires the model to generate valid JSON with correct argument names and types, which smaller and heavily quantized models often struggle with. For reliable tool use, prefer models with the FunctionCalling capability flag. Qwen 3.5 (4B+), GLM 4.7 Flash, and GPT-OSS 20B are the strongest choices in the LM-Kit.NET catalog. For agentic workflows with multi-step reasoning, use the largest model your hardware supports.


How Function Calling Works at the Model Level

When a model uses a tool, it must:

  1. Decide whether to call a tool (vs generate text).
  2. Select the right tool from the available list.
  3. Generate valid JSON matching the tool's input schema exactly.
  4. Interpret the tool result and continue reasoning.

Steps 2 and 3 are where models fail most often. Smaller models may hallucinate tool names, produce malformed JSON, or miss required fields. LM-Kit.NET corrects minor JSON formatting issues, but fundamentally better models produce fundamentally better tool calls.


Agentic Workflows (Multi-Step Tool Use)

Models that excel at ReAct-style reasoning with multiple tool calls per task:

Model Parameters Context Strengths
gptoss:20b 20B 131K Advanced reasoning, long context, strong JSON generation
glm4.7-flash ~30B (MoE) 131K Top performance in its class, agentic tasks, math, coding
qwen3.5:27b 27B 128K Excellent multilingual tool calling, thinking mode
qwen3.5:9b 9B 128K Best balance of quality and speed for most hardware

Agentic Coding (File System + Web Search Tools)

Models optimized for reading, writing, and navigating code with tool calls:

Model Parameters Context Strengths
qwen3-coder:30b-a3b ~30B (MoE, 3.3B active) 262K Purpose-built for agentic coding, 128 experts, native tool calling
devstral-small2 24B 393K Top SWE-bench scores among open models under 30B, vision capable
qwen3.5:9b 9B 128K Strong code generation with general-purpose versatility
gptoss:20b 20B 131K Advanced reasoning with long context for large codebases

โ†’ Try it: Code Analysis Assistant ยท Code Writing Assistant

Simple Tool Calling (1-2 Tools, Straightforward Tasks)

Model Parameters Context Strengths
qwen3.5:4b 4B 128K Good tool calling for its size, fast on GPU
phi4-mini:3.8b 3.8B 16K Compact, efficient, solid JSON generation
gemma4:e4b 4B 128K Strong instruction following
Model Size Why
Under 3B parameters Unreliable JSON generation, frequent hallucination of tool names
Heavy quantization (Q2, Q3) Degrades JSON structure accuracy significantly
Models without FunctionCalling flag Not trained or optimized for structured tool output

The FunctionCalling Capability Flag

Models in the LM-Kit.NET catalog are tagged with capability flags. Check for FunctionCalling to confirm a model supports tool use:

using LMKit.Model;

var card = ModelCard.GetByModelID("qwen3.5:9b");

if (card.Capabilities.HasFlag(ModelCapabilities.FunctionCalling))
    Console.WriteLine("This model supports function calling");

Models with this flag have been tested for reliable JSON schema adherence and multi-turn tool interactions.


Tips for Reliable Tool Calling

1. Use the Largest Model Your Hardware Allows

Bigger models produce more reliable tool calls. If you can run 8B, prefer it over 4B for agentic tasks.

2. Keep Tool Definitions Concise

Tool descriptions and parameter schemas consume context tokens. Concise, clear definitions help the model understand the tools better. Avoid registering tools the agent does not need.

3. Use Q4_K_M Quantization

The Q4_K_M quantization (default in the LM-Kit.NET catalog) provides the best balance of quality and size. Avoid Q2 or Q3 quantizations for tool-calling agents.

4. Limit the Number of Registered Tools

Models perform better with fewer, well-defined tools than with a large toolbox. Register only the tools relevant to the current task.

5. Set MaxIterations Appropriately

Prevent runaway tool-calling loops:

var agent = Agent.CreateBuilder(model)
    .WithTools(tools => { /* ... */ })
    .WithMaxIterations(10)   // Default: 10. Increase for complex tasks.
    .Build();

Share