What is Chain-of-Thought (CoT) Reasoning?

TL;DR

Chain-of-Thought (CoT) is a prompting and planning technique that guides a language model to reason step by step before producing a final answer. Instead of jumping directly to a conclusion, the model articulates intermediate reasoning steps, dramatically improving accuracy on complex tasks like math, logic, multi-hop question answering, and planning. In LM-Kit.NET, CoT is implemented as the ChainOfThoughtHandler planning strategy, configurable via PlanningStrategy.ChainOfThought on agents, and is complemented by native model reasoning support through ReasoningLevel on conversations.

What is Chain-of-Thought?

Definition: Chain-of-Thought (CoT) is a reasoning technique where a language model generates an explicit sequence of intermediate steps before arriving at a final answer. The term was introduced by Wei et al. (2022) to describe how prompting models to "think step by step" unlocks reasoning capabilities that are latent in the model's weights but not activated by direct question-answer prompting.

Without CoT vs. With CoT

Without CoT:
  Q: "If a train travels 120 km in 2 hours, and then 180 km in 3 hours,
      what is its average speed for the entire trip?"
  A: "72 km/h"  ← Wrong (jumped to a guess)

With CoT:
  Q: Same question
  A: "Let's think through this step by step:
      1. Total distance = 120 + 180 = 300 km
      2. Total time = 2 + 3 = 5 hours
      3. Average speed = 300 / 5 = 60 km/h
      Final Answer: 60 km/h"  ← Correct

The intermediate steps serve two purposes: they guide the model's own token prediction toward the correct answer, and they make the reasoning transparent and auditable for the developer.

Why CoT Works

1. Decomposition

Complex problems become manageable when broken into smaller sub-problems. Each step is simpler to solve correctly than the entire problem at once. The model's probability of getting each small step right is high, and the chain leads to the correct final answer.

2. Working Memory

LLMs have no persistent memory between tokens. By writing intermediate results into the context window, CoT provides a form of external working memory. The model can "look back" at its own reasoning steps in the same way a human writes intermediate calculations on paper.

3. Latent Capability Activation

Large models contain reasoning capabilities learned during pre-training on mathematical proofs, code, scientific papers, and logical arguments. CoT prompting activates these latent pathways by matching the pattern of step-by-step reasoning the model saw during training.

CoT Variants

Variant	Description	Best For
Zero-shot CoT	Add "Let's think step by step" to the prompt. No examples needed.	Quick reasoning boost on any task
Few-shot CoT	Provide examples with explicit reasoning chains before the question	Complex domain-specific reasoning
Self-Consistency	Generate multiple CoT paths, then vote on the most common answer	High-stakes decisions requiring confidence
ReAct	Interleave reasoning (Thought) with actions (tool calls) and observations	Agentic tasks with external data access
Tree-of-Thought	Explore multiple reasoning paths with backtracking	Problems with many possible solution paths
Plan-and-Execute	Generate a full plan first, then execute steps sequentially	Multi-step workflows with dependencies
Reflection	Generate, self-critique, then refine the answer	Quality-sensitive tasks requiring self-correction

Planning Strategies in LM-Kit.NET

LM-Kit.NET implements CoT and its variants as planning strategies through the PlanningStrategy enum and corresponding handler classes in the LMKit.Agents.Planning namespace:

Strategy	Handler	Description
`None`	`NonePlanningHandler`	Direct response, no reasoning overhead
`ChainOfThought`	`ChainOfThoughtHandler`	Step-by-step reasoning before a final answer
`ReAct`	`ReActHandler`	Thought-Action-Observation loop with tool calls
`PlanAndExecute`	`PlanAndExecuteHandler`	Two-phase: create a plan, then execute each step
`Reflection`	`ReflectionHandler`	Generate, self-critique, then refine
`TreeOfThought`	`TreeOfThoughtHandler`	Multi-path exploration with backtracking

ChainOfThoughtHandler

The ChainOfThoughtHandler injects a reasoning instruction into the prompt and parses the model's output for a structured final answer:

ReasoningInstruction: Customizable prompt prefix (default: "Let's think through this step by step:")
RequireFinalAnswerMarker: When true, the model must emit "Final Answer:" to separate reasoning from the answer
Instance: Singleton for default configuration
WithInstruction(string): Factory for custom instructions
WithRequiredMarker(): Factory requiring the final answer marker

ReActHandler

The ReActHandler extends CoT with an action loop. The model alternates between:

Thought: Reasoning about what to do next
Action: Calling a tool with specific arguments
Observation: Receiving the tool's result

This cycle continues until the model reaches a Final Answer. ReAct is the standard strategy for agents that need to interact with external systems.

Code Example

Agent with Chain-of-Thought Planning

using LMKit.Model;
using LMKit.Agents;

var model = LM.LoadFromModelID("gemma4:e4b");

// Build an agent with CoT planning
var agent = Agent.CreateBuilder(model)
    .WithPlanning(PlanningStrategy.ChainOfThought)
    .WithSystemPrompt("You are a precise analytical assistant. Always show your reasoning.")
    .Build();

// The agent will reason step by step before answering
using var executor = new AgentExecutor();
var result = await executor.ExecuteAsync(agent, "What is 15% of 340, rounded to the nearest integer?");
// Model output includes visible reasoning steps + "Final Answer: 51"

Agent with ReAct (CoT + Tools)

using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;

var model = LM.LoadFromModelID("glm4.7-flash");

// ReAct combines step-by-step reasoning with tool calls
var agent = Agent.CreateBuilder(model)
    .WithPlanning(PlanningStrategy.ReAct)
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.WebSearch);
        tools.Register(BuiltInTools.Calculator);
    })
    .Build();

// The agent will: Think → Search → Observe → Think → Calculate → Final Answer
using var executor = new AgentExecutor();
var result = await executor.ExecuteAsync(agent,
    "What is the population of Tokyo divided by the population of Paris?");

Custom CoT Instruction

using LMKit.Agents;
using LMKit.Agents.Planning;

// Customize the reasoning instruction for a specific domain
var cotHandler = ChainOfThoughtHandler.WithInstruction(
    "Analyze this medical report systematically:\n" +
    "1. Identify key findings\n" +
    "2. Note any abnormal values\n" +
    "3. Consider possible implications\n" +
    "4. Provide your assessment");

var agent = Agent.CreateBuilder(model)
    .WithPlanning(cotHandler)
    .Build();

Native Model Reasoning Level

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;

var model = LM.LoadFromModelID("qwen3.6:27b");
using var chat = new MultiTurnConversation(model);

// Enable the model's built-in reasoning mode (thinking tokens)
chat.ReasoningLevel = ReasoningLevel.High;

// The model internally generates chain-of-thought tokens
// before producing the visible response
var answer = await chat.SubmitAsync("Solve: x^2 + 5x + 6 = 0");

ReasoningLevel controls the model's native thinking mode (available on models like Qwen 3.5 that support it). This is distinct from the ChainOfThoughtHandler planning strategy: reasoning level triggers internal thinking tokens at the model level, while CoT planning structures the visible prompt and output format at the application level. Both can be used together.

When to Use CoT

Scenario	Strategy	Why
Math, logic, or multi-step reasoning	ChainOfThought	Makes intermediate steps explicit and auditable
Tasks requiring external data	ReAct	Combines reasoning with tool calls
Complex workflows with ordered steps	PlanAndExecute	Plans the full sequence before acting
Quality-critical output	Reflection	Self-correction improves accuracy
Problems with many possible paths	TreeOfThought	Explores alternatives before committing
Simple, straightforward questions	None	CoT adds unnecessary overhead and latency

The Cost of CoT

CoT is not free. Each reasoning step consumes tokens from the context window and adds latency. For simple factual questions ("What is the capital of France?"), CoT wastes tokens on unnecessary reasoning. Use CoT selectively for tasks where explicit reasoning demonstrably improves accuracy.

Key Terms

Chain-of-Thought (CoT): A technique where the model generates intermediate reasoning steps before the final answer.
Zero-Shot CoT: Adding "Let's think step by step" to the prompt without providing examples.
Few-Shot CoT: Providing examples with explicit reasoning chains as context before the question.
ReAct: Reasoning and Acting: interleaving thoughts, tool calls, and observations.
Tree-of-Thought (ToT): Exploring multiple reasoning branches and backtracking from dead ends.
Self-Consistency: Generating multiple reasoning chains and selecting the most common final answer.
Reasoning Level: A model-native parameter controlling how much internal thinking the model performs.
Final Answer Marker: A delimiter ("Final Answer:") that separates the reasoning trace from the answer.

ChainOfThoughtHandler: CoT planning handler with customizable instructions
ReActHandler: ReAct planning handler for reasoning with tools
PlanningStrategy: Enum for selecting agent planning strategies
ReasoningLevel: Native model reasoning intensity
TreeOfThoughtHandler: Multi-path reasoning with backtracking
ReflectionHandler: Generate-critique-refine pattern

AI Agent Reasoning: The broader reasoning framework that CoT is part of
AI Agent Planning: Planning strategies including CoT, ReAct, and Tree-of-Thought
AI Agent Tools: Tools that ReAct agents invoke during reasoning
Prompt Engineering: Crafting prompts that elicit step-by-step reasoning
AI Agents: Autonomous systems that use planning strategies
AI Agent Reflection: Self-correction as a post-reasoning strategy
Context Windows: CoT reasoning consumes context tokens
Hallucination: CoT reduces hallucination on reasoning tasks by making logic explicit
Few-Shot Learning: Few-shot CoT provides reasoning examples in the prompt

External Resources

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022): The foundational CoT paper
ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2023): The ReAct pattern for tool-using agents
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (Yao et al., 2023): Multi-path reasoning with backtracking
Self-Consistency Improves Chain of Thought Reasoning (Wang et al., 2023): Voting across multiple reasoning paths

Summary

Chain-of-Thought (CoT) is the technique of guiding language models to reason step by step before producing a final answer. It dramatically improves accuracy on complex tasks by decomposing problems, providing working memory through the context window, and activating latent reasoning capabilities. In LM-Kit.NET, CoT is implemented through the ChainOfThoughtHandler planning strategy, with related strategies for tool-augmented reasoning (ReAct), multi-path exploration (Tree-of-Thought), and self-correction (Reflection). The ReasoningLevel property provides an additional, model-native reasoning knob. CoT is essential for agentic AI applications where accuracy, transparency, and auditability of the reasoning process matter.

Table of Contents

What is Chain-of-Thought (CoT) Reasoning?

TL;DR