Control Reasoning and Chain-of-Thought in Conversations
Reasoning models produce intermediate "thinking" tokens before generating their final answer. This chain-of-thought process improves accuracy on complex tasks like math, logic, and multi-step analysis, but costs extra tokens and time. LM-Kit.NET's ReasoningLevel property gives you a dial to control how much reasoning a model performs. This tutorial shows how to configure reasoning levels for different task types and measure the quality and performance trade-offs.
Why Reasoning Control Matters
Two real-world problems that reasoning control solves:
- Balancing speed vs. accuracy per task. A customer support bot answering "What are your business hours?" doesn't need chain-of-thought reasoning. But the same bot solving "Calculate the total cost with the 15% loyalty discount, $8.50 shipping, and the buy-2-get-1-free promotion" benefits from step-by-step reasoning. Setting
ReasoningLevel.Nonefor simple queries andReasoningLevel.Highfor complex ones optimizes both latency and correctness. - Controlling token budgets in production. Reasoning tokens count toward context usage and generation time. For high-throughput systems, setting
ReasoningLevel.Lowcaps the reasoning overhead while still allowing the model to show its work on tricky inputs.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| Chat model | A model with reasoning capability (e.g., qwen3:4b, qwen3:8b) |
| VRAM | 4 GB+ |
Models without reasoning support will ignore the ReasoningLevel setting and produce normal completions.
Step 1: Create the Project
dotnet new console -n ReasoningControl
cd ReasoningControl
dotnet add package LM-Kit.NET
Step 2: Understand Reasoning Levels
┌───────────────────────────────────────────────────┐
│ ReasoningLevel Spectrum │
├──────────┬───────────┬───────────┬────────────────┤
│ None │ Low │ Medium │ High │
│ │ │ │ │
│ No extra │ Brief │ Balanced │ Deep │
│ thinking │ scratch │ reasoning │ deliberation │
│ │ notes │ │ │
│ Fastest │ Fast │ Moderate │ Slowest │
│ Simple │ Moderate │ Complex │ Very complex │
│ queries │ tasks │ problems │ reasoning │
└──────────┴───────────┴───────────┴────────────────┘
| Level | Internal Behavior | Token Overhead | Best For |
|---|---|---|---|
None |
No reasoning tokens produced | Zero | Simple Q&A, lookup, greetings |
Low |
Minimal scratch space when helpful | Low | Moderate tasks, formatting |
Medium |
Balanced reasoning depth | Moderate | Multi-step problems, analysis |
High |
Maximum deliberation | High | Math, logic puzzles, complex code |
Step 3: Configure Reasoning in SingleTurnConversation
using System.Text;
using System.Diagnostics;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load a reasoning-capable model
// ──────────────────────────────────────
using LM model = LM.LoadFromModelID("qwen3:4b",
loadingProgress: p =>
{
Console.Write($"\r Loading: {p * 100:F0}% ");
return true;
});
Console.WriteLine($"\n Model loaded: {model.ModelName}\n");
// ──────────────────────────────────────
// 2. No reasoning: fast, simple answers
// ──────────────────────────────────────
Console.WriteLine("=== ReasoningLevel.None ===\n");
var fastChat = new SingleTurnConversation(model)
{
ReasoningLevel = ReasoningLevel.None,
SystemPrompt = "You are a helpful assistant. Be concise.",
MaximumCompletionTokens = 256
};
var sw = Stopwatch.StartNew();
TextGenerationResult fastResult = fastChat.Submit("What is the capital of France?");
sw.Stop();
Console.WriteLine($"Answer: {fastResult.Completion}");
Console.WriteLine($"Tokens: {fastResult.GeneratedTokenCount}, Time: {sw.ElapsedMilliseconds}ms");
Console.WriteLine($"Speed: {fastResult.TokenGenerationRate:F1} tokens/s\n");
// ──────────────────────────────────────
// 3. High reasoning: thorough analysis
// ──────────────────────────────────────
Console.WriteLine("=== ReasoningLevel.High ===\n");
var reasoningChat = new SingleTurnConversation(model)
{
ReasoningLevel = ReasoningLevel.High,
SystemPrompt = "You are a math tutor. Show your work step by step.",
MaximumCompletionTokens = 1024
};
sw.Restart();
TextGenerationResult reasoningResult = reasoningChat.Submit(
"A store offers 20% off all items. If you buy 3 shirts at $45 each " +
"and 2 pants at $60 each, what is the total after the discount? " +
"Also calculate the per-item average cost.");
sw.Stop();
Console.WriteLine($"Answer: {reasoningResult.Completion}");
Console.WriteLine($"Tokens: {reasoningResult.GeneratedTokenCount}, Time: {sw.ElapsedMilliseconds}ms");
Console.WriteLine($"Speed: {reasoningResult.TokenGenerationRate:F1} tokens/s\n");
Step 4: Configure Reasoning in MultiTurnConversation
// ──────────────────────────────────────
// Multi-turn conversation with reasoning
// ──────────────────────────────────────
Console.WriteLine("=== Multi-Turn with Medium Reasoning ===\n");
var multiTurn = new MultiTurnConversation(model)
{
ReasoningLevel = ReasoningLevel.Medium,
SystemPrompt = "You are a logical reasoning assistant. Think through problems carefully.",
MaximumCompletionTokens = 512
};
// First turn: pose the problem
TextGenerationResult turn1 = multiTurn.Submit(
"I have a 3-gallon jug and a 5-gallon jug. How do I measure exactly 4 gallons?");
Console.WriteLine($"Turn 1: {turn1.Completion}\n");
// Follow-up: ask for verification
TextGenerationResult turn2 = multiTurn.Submit(
"Can you verify each step by tracking the water level in both jugs?");
Console.WriteLine($"Turn 2: {turn2.Completion}\n");
Console.WriteLine($"Total tokens generated: " +
$"{turn1.GeneratedTokenCount + turn2.GeneratedTokenCount}");
Step 5: Compare Reasoning Levels Side by Side
// ──────────────────────────────────────
// Benchmark different reasoning levels
// ──────────────────────────────────────
Console.WriteLine("=== Reasoning Level Comparison ===\n");
string testPrompt = "If all roses are flowers and some flowers fade quickly, " +
"can we conclude that some roses fade quickly? Explain your reasoning.";
ReasoningLevel[] levels = { ReasoningLevel.None, ReasoningLevel.Low,
ReasoningLevel.Medium, ReasoningLevel.High };
Console.WriteLine($"{"Level",-10} {"Tokens",-10} {"Time (ms)",-12} {"Speed (t/s)",-12}");
Console.WriteLine(new string('─', 50));
foreach (ReasoningLevel level in levels)
{
var chat = new SingleTurnConversation(model)
{
ReasoningLevel = level,
SystemPrompt = "You are a logic instructor.",
MaximumCompletionTokens = 512
};
sw.Restart();
TextGenerationResult result = chat.Submit(testPrompt);
sw.Stop();
Console.WriteLine($"{level,-10} {result.GeneratedTokenCount,-10} " +
$"{sw.ElapsedMilliseconds,-12} {result.TokenGenerationRate,-12:F1}");
}
Step 6: Adaptive Reasoning Based on Task Complexity
Build a system that automatically selects the reasoning level based on the input:
// ──────────────────────────────────────
// Adaptive reasoning selector
// ──────────────────────────────────────
Console.WriteLine("\n=== Adaptive Reasoning ===\n");
string[] prompts =
{
"Hello, how are you?",
"Summarize the key differences between TCP and UDP.",
"Prove that the square root of 2 is irrational.",
"What color is the sky?"
};
foreach (string prompt in prompts)
{
ReasoningLevel level = SelectReasoningLevel(prompt);
var chat = new SingleTurnConversation(model)
{
ReasoningLevel = level,
MaximumCompletionTokens = 512
};
TextGenerationResult result = chat.Submit(prompt);
Console.WriteLine($"[{level}] \"{prompt}\"");
Console.WriteLine($" → {result.Completion.Split('\n')[0]}...");
Console.WriteLine($" Tokens: {result.GeneratedTokenCount}\n");
}
// ──────────────────────────────────────
// Helper: heuristic reasoning level selector
// ──────────────────────────────────────
static ReasoningLevel SelectReasoningLevel(string prompt)
{
string lower = prompt.ToLowerInvariant();
// High reasoning for math, proofs, and logic
if (lower.Contains("prove") || lower.Contains("calculate") ||
lower.Contains("solve") || lower.Contains("irrational") ||
lower.Contains("equation"))
{
return ReasoningLevel.High;
}
// Medium for analysis and comparison tasks
if (lower.Contains("compare") || lower.Contains("analyze") ||
lower.Contains("differences") || lower.Contains("explain why") ||
lower.Contains("summarize"))
{
return ReasoningLevel.Medium;
}
// Low for moderate complexity
if (lower.Length > 100 || lower.Contains("describe") || lower.Contains("list"))
{
return ReasoningLevel.Low;
}
// None for simple queries
return ReasoningLevel.None;
}
Step 7: Run the Application
dotnet run
Expected output pattern:
=== ReasoningLevel.None ===
Answer: The capital of France is Paris.
Tokens: 8, Time: 45ms
Speed: 177.8 tokens/s
=== ReasoningLevel.High ===
Answer: Let me work through this step by step...
Tokens: 186, Time: 1240ms
Speed: 150.0 tokens/s
=== Reasoning Level Comparison ===
Level Tokens Time (ms) Speed (t/s)
──────────────────────────────────────────────────
None 42 280 150.0
Low 78 510 152.9
Medium 134 870 154.0
High 198 1340 147.8
ReasoningLevel Reference
| Level | Enum Value | Description |
|---|---|---|
None |
0 | No reasoning tokens requested or exposed. The model produces only the final answer |
Low |
1 | Minimal internal deliberation. Brief scratch space used only when helpful |
Medium |
2 | Balanced speed and quality. Moderate reasoning depth |
High |
3 | Maximum reasoning depth. May trade off speed for thoroughness |
Common Issues
| Problem | Cause | Fix |
|---|---|---|
ReasoningLevel has no visible effect |
Model doesn't support reasoning | Use a reasoning-capable model (e.g., qwen3:4b or larger) |
| Tokens much higher than expected | ReasoningLevel.High produces many thinking tokens |
Reduce to Medium or Low for less overhead |
Error setting ReasoningLevel on MultiTurnConversation |
Chat history already has messages | Set ReasoningLevel before the first Submit call |
| Same output regardless of level | Simple prompt doesn't need reasoning | Test with complex prompts (math, logic, multi-step analysis) |
Next Steps
- Control Token Sampling with Dynamic Strategies: fine-tune generation beyond reasoning level.
- Build a Conversational Assistant with Memory: reasoning in long-running conversations.
- Orchestrate Multi-Agent Workflows with Patterns: combine reasoning agents in pipelines.