Stream Agent Responses in Real Time
Waiting for an agent to finish its entire reasoning chain before showing anything to the user creates a poor experience. Users see a blank screen for seconds (or minutes for complex tasks), then a wall of text appears all at once. LM-Kit.NET's agent streaming API delivers tokens as they are generated, gives you visibility into tool calls and thinking steps as they happen, and supports multiple consumption patterns: callbacks, async enumerables, and custom stream handlers. This guide builds a fully streaming agent system with live token display, thinking visibility, tool call tracking, and a custom dashboard handler.
Why Real-Time Streaming Matters
Two production problems that streaming solves:
- Perceived latency in user-facing applications. When an agent takes 8 seconds to generate a response, the user sees nothing for 8 seconds. With streaming, the first token appears in under 200ms and the response builds progressively. Users perceive the system as fast and responsive even when total generation time is unchanged.
- Observability into agentic reasoning. Complex agents make tool calls, think through problems, and iterate. Without streaming, you get only the final answer. With streaming, you can watch each reasoning step, see which tools were called, track thinking tokens, and detect problems in real time rather than debugging after the fact.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB |
| Disk | ~3 GB free for model download |
Step 1: Create the Project
dotnet new console -n AgentStreaming
cd AgentStreaming
dotnet add package LM-Kit.NET
Step 2: Stream with a Simple Content Callback
The simplest streaming approach uses RunStreamingAsync with a string callback. You receive only the content text, one chunk at a time:
using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
.WithInstruction("You are a helpful research assistant with access to tools.")
.WithTools(tools =>
{
tools.Register(BuiltInTools.Calculator);
tools.Register(BuiltInTools.DateTime);
tools.Register(BuiltInTools.WebSearch);
})
.WithMaxIterations(5)
.Build();
// ──────────────────────────────────────
// 3. Stream content with a simple callback
// ──────────────────────────────────────
Console.Write("Assistant: ");
AgentExecutionResult result = await agent.RunStreamingAsync(
"What is 15% of $2,340? Also, what day of the week is today?",
onContent: text => Console.Write(text));
Console.WriteLine($"\n\n[Tokens used: {result.TokenUsage}]");
The onContent callback fires for every text chunk the model generates. The response builds character by character on screen.
Step 3: Stream with Full Token Metadata
For richer visibility, use the AgentStreamToken callback. Each token carries metadata including its type (content, thinking, tool call, tool result, status):
// ──────────────────────────────────────
// 4. Token-level streaming with metadata
// ──────────────────────────────────────
Console.WriteLine("\n── Token-Level Streaming ──\n");
int tokenCount = 0;
AgentExecutionResult detailedResult = await agent.RunStreamingAsync(
"Explain the concept of compound interest and calculate the result of $10,000 at 5% for 10 years.",
onToken: token =>
{
tokenCount++;
switch (token.Type)
{
case AgentStreamTokenType.Content:
Console.Write(token.Text);
break;
case AgentStreamTokenType.Thinking:
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.Write(token.Text);
Console.ResetColor();
break;
case AgentStreamTokenType.ToolCall:
Console.ForegroundColor = ConsoleColor.Yellow;
Console.Write($"\n 🔧 Tool call: {token.Text}");
Console.ResetColor();
break;
case AgentStreamTokenType.ToolResult:
Console.ForegroundColor = ConsoleColor.Green;
Console.Write($"\n ✅ Result: {token.Text}");
Console.ResetColor();
break;
case AgentStreamTokenType.Status:
Console.ForegroundColor = ConsoleColor.Cyan;
Console.Write($"\n ℹ️ {token.Text}");
Console.ResetColor();
break;
case AgentStreamTokenType.Error:
Console.ForegroundColor = ConsoleColor.Red;
Console.Write($"\n ❌ {token.Text}");
Console.ResetColor();
break;
}
});
Console.WriteLine($"\n\n[Total stream tokens: {tokenCount}]");
Console.WriteLine($"[Final marker received: {detailedResult.Content.Length} chars]");
The AgentStreamTokenType enum distinguishes between:
- Content: the visible response text
- Thinking: internal reasoning tokens (when the model uses a think/reason mode)
- ToolCall: when the agent invokes a tool (includes tool name and arguments)
- ToolResult: the tool's return value
- Status: orchestration events (agent starting, iteration count)
- Error: error messages during execution
Step 4: Consume Streams with IAsyncEnumerable
For LINQ-style processing and await foreach patterns, use the StreamAsync extension method:
// ──────────────────────────────────────
// 5. IAsyncEnumerable streaming
// ──────────────────────────────────────
Console.WriteLine("\n── Async Enumerable Stream ──\n");
var contentBuilder = new StringBuilder();
int toolCallsSeen = 0;
await foreach (AgentStreamToken token in agent.StreamAsync(
"Search the web for the current population of Tokyo and calculate what percentage it is of Japan's total population."))
{
switch (token.Type)
{
case AgentStreamTokenType.Content:
contentBuilder.Append(token.Text);
Console.Write(token.Text);
break;
case AgentStreamTokenType.ToolCall:
toolCallsSeen++;
Console.ForegroundColor = ConsoleColor.Yellow;
Console.Write($"\n [Tool #{toolCallsSeen}] {token.Text}");
Console.ResetColor();
break;
case AgentStreamTokenType.ToolResult:
Console.ForegroundColor = ConsoleColor.Green;
Console.Write($"\n [Result] {token.Text}\n");
Console.ResetColor();
break;
}
}
Console.WriteLine($"\n\nFinal content length: {contentBuilder.Length} chars");
Console.WriteLine($"Tool calls observed: {toolCallsSeen}");
The StreamAsync method returns an IAsyncEnumerable<AgentStreamToken> that you can filter, transform, or aggregate using standard async LINQ. For content-only streaming, use StreamContentAsync which yields plain strings:
// ──────────────────────────────────────
// 6. Content-only async enumerable
// ──────────────────────────────────────
Console.Write("\nAssistant: ");
await foreach (string text in agent.StreamContentAsync("What is the meaning of life?"))
{
Console.Write(text);
}
Console.WriteLine();
Step 5: Build a Custom Stream Handler
For production applications, implement IAgentStreamHandler to build a structured streaming pipeline with lifecycle hooks:
// ──────────────────────────────────────
// 7. Custom stream handler
// ──────────────────────────────────────
public sealed class DashboardStreamHandler : IAgentStreamHandler
{
private readonly Stopwatch _stopwatch = new();
private int _tokenCount;
private int _toolCalls;
private readonly List<string> _toolLog = new();
public Task OnStartAsync(Agent agent, string input, CancellationToken ct)
{
_stopwatch.Start();
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"╔══════════════════════════════════════╗");
Console.WriteLine($"║ Agent: {agent.Name,-28} ║");
Console.WriteLine($"║ Input: {input[..Math.Min(28, input.Length)],-28} ║");
Console.WriteLine($"╚══════════════════════════════════════╝");
Console.ResetColor();
return Task.CompletedTask;
}
public Task OnTokenAsync(AgentStreamToken token, CancellationToken ct)
{
_tokenCount++;
switch (token.Type)
{
case AgentStreamTokenType.Content:
Console.Write(token.Text);
break;
case AgentStreamTokenType.ToolCall:
_toolCalls++;
_toolLog.Add(token.Text);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.Write($"\n ⚡ {token.Text}");
Console.ResetColor();
break;
case AgentStreamTokenType.ToolResult:
Console.ForegroundColor = ConsoleColor.Green;
Console.Write($" → {token.Text}\n");
Console.ResetColor();
break;
case AgentStreamTokenType.Thinking:
// Silently collect thinking tokens (show summary at end)
break;
}
return Task.CompletedTask;
}
public Task OnCompleteAsync(AgentExecutionResult result, CancellationToken ct)
{
_stopwatch.Stop();
Console.WriteLine();
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"┌──────────────────────────────────────┐");
Console.WriteLine($"│ Completed in {_stopwatch.ElapsedMilliseconds,6}ms │");
Console.WriteLine($"│ Tokens streamed: {_tokenCount,5} │");
Console.WriteLine($"│ Tool calls: {_toolCalls,5} │");
Console.WriteLine($"└──────────────────────────────────────┘");
Console.ResetColor();
return Task.CompletedTask;
}
public Task OnErrorAsync(Exception exception, CancellationToken ct)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine($"\n ❌ Error: {exception.Message}");
Console.ResetColor();
return Task.CompletedTask;
}
}
Use the handler with RunStreamingAsync:
using System.Diagnostics;
var handler = new DashboardStreamHandler();
var dashboardResult = await agent.RunStreamingAsync(
"Calculate the monthly payment for a $300,000 mortgage at 6.5% interest over 30 years.",
handler);
Step 6: Use Built-In Console Handlers
LM-Kit.NET includes pre-built console handlers for quick development. RunStreamingToConsoleAsync prints content directly, with an optional verbose mode that shows all token types with color coding:
// ──────────────────────────────────────
// 8. Built-in console streaming
// ──────────────────────────────────────
// Standard mode: content only
Console.WriteLine("\n── Standard Console Stream ──\n");
await agent.RunStreamingToConsoleAsync("What are three benefits of local AI inference?",
verbose: false);
// Verbose mode: shows tools, thinking, status with colors
Console.WriteLine("\n\n── Verbose Console Stream ──\n");
await agent.RunStreamingToConsoleAsync(
"Search the web for the latest .NET 10 features and summarize them.",
verbose: true);
You can also create the handlers directly for more control:
// DelegateStreamHandler with pre-built presets
var consoleHandler = DelegateStreamHandler.Console(); // Content only
var verboseHandler = DelegateStreamHandler.ConsoleVerbose(); // All types, colored
await agent.RunStreamingAsync("Quick test", consoleHandler);
Step 7: Streaming with Cancellation
Support user-initiated cancellation (pressing Escape, timeout, or external signal) using CancellationToken:
// ──────────────────────────────────────
// 9. Cancellable streaming
// ──────────────────────────────────────
Console.WriteLine("\n── Cancellable Stream (press Escape to stop) ──\n");
using var cts = new CancellationTokenSource();
// Background task: watch for Escape key
_ = Task.Run(() =>
{
while (!cts.IsCancellationRequested)
{
if (Console.KeyAvailable && Console.ReadKey(true).Key == ConsoleKey.Escape)
{
Console.ForegroundColor = ConsoleColor.Red;
Console.WriteLine("\n [Cancelled by user]");
Console.ResetColor();
cts.Cancel();
break;
}
Thread.Sleep(50);
}
});
try
{
Console.Write("Assistant: ");
await foreach (string text in agent.StreamContentAsync(
"Write a detailed essay about the history of artificial intelligence.",
cancellationToken: cts.Token))
{
Console.Write(text);
}
}
catch (OperationCanceledException)
{
Console.WriteLine("\n[Stream was cancelled gracefully]");
}
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| No tokens appear, then entire response at once | Using ExecuteAsync instead of streaming methods |
Switch to RunStreamingAsync, StreamAsync, or StreamContentAsync |
| Thinking tokens not visible | Model does not use a thinking/reasoning mode | Set ReasoningLevel on the agent, or use a model that supports thinking tokens |
| Tool calls appear as garbled JSON | Consuming ToolCall tokens as plain content |
Check token.Type and handle ToolCall separately from Content |
OnCompleteAsync never fires |
Agent hit MaxIterations or threw an exception |
Check OnErrorAsync for exceptions; increase MaxIterations if the agent runs out of steps |
| Slow first-token latency | Model not warmed up, or prompt is very long | Run a short warmup prompt first; reduce system prompt length if possible |
Next Steps
- Build a Conversational Assistant with Memory: add multi-turn memory to your streaming agent.
- Route Prompts Across Models with RouterOrchestrator: stream responses from dynamically routed agents.
- Build a Resilient Production Agent: handle streaming errors with retry and fallback policies.
- Control Token Sampling with Dynamic Strategies: adjust temperature and sampling to control streaming creativity.