Stream Agent Responses in Real Time

Waiting for an agent to finish its entire reasoning chain before showing anything to the user creates a poor experience. Users see a blank screen for seconds (or minutes for complex tasks), then a wall of text appears all at once. LM-Kit.NET's agent streaming API delivers tokens as they are generated, gives you visibility into tool calls and thinking steps as they happen, and supports multiple consumption patterns: callbacks, async enumerables, and custom stream handlers. This guide builds a fully streaming agent system with live token display, thinking visibility, tool call tracking, and a custom dashboard handler.

Why Real-Time Streaming Matters

Two production problems that streaming solves:

Perceived latency in user-facing applications. When an agent takes 8 seconds to generate a response, the user sees nothing for 8 seconds. With streaming, the first token appears in under 200ms and the response builds progressively. Users perceive the system as fast and responsive even when total generation time is unchanged.
Observability into agentic reasoning. Complex agents make tool calls, think through problems, and iterate. Without streaming, you get only the final answer. With streaming, you can watch each reasoning step, see which tools were called, track thinking tokens, and detect problems in real time rather than debugging after the fact.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	4+ GB
Disk	~3 GB free for model download

Step 1: Create the Project

dotnet new console -n AgentStreaming
cd AgentStreaming
dotnet add package LM-Kit.NET

Step 2: Stream with a Simple Content Callback

The simplest streaming approach uses RunStreamingAsync with a string callback. You receive only the content text, one chunk at a time:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful research assistant with access to tools.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
        tools.Register(BuiltInTools.WebSearch);
    })
    .WithMaxIterations(5)
    .Build();

// ──────────────────────────────────────
// 3. Stream content with a simple callback
// ──────────────────────────────────────
Console.Write("Assistant: ");
AgentExecutionResult result = await agent.RunStreamingAsync(
    "What is 15% of $2,340? Also, what day of the week is today?",
    onContent: text => Console.Write(text));

Console.WriteLine($"\n\n[Tokens used: {result.InferenceCount}]");

The onContent callback fires for every text chunk the model generates. The response builds character by character on screen.

Step 3: Stream with Full Token Metadata

For richer visibility, use the AgentStreamToken callback. Each token carries metadata including its type (content, thinking, tool call, tool result, status):

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful research assistant with access to tools.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
        tools.Register(BuiltInTools.WebSearch);
    })
    .WithMaxIterations(5)
    .Build();

// ──────────────────────────────────────
// 4. Token-level streaming with metadata
// ──────────────────────────────────────
Console.WriteLine("\n── Token-Level Streaming ──\n");

int tokenCount = 0;

AgentExecutionResult detailedResult = await agent.RunStreamingAsync(
    "Explain the concept of compound interest and calculate the result of $10,000 at 5% for 10 years.",
    onToken: token =>
    {
        tokenCount++;

        switch (token.Type)
        {
            case AgentStreamTokenType.Content:
                Console.Write(token.Text);
                break;

            case AgentStreamTokenType.Thinking:
                Console.ForegroundColor = ConsoleColor.DarkGray;
                Console.Write(token.Text);
                Console.ResetColor();
                break;

            case AgentStreamTokenType.ToolCall:
                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.Write($"\n  🔧 Tool call: {token.Text}");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.ToolResult:
                Console.ForegroundColor = ConsoleColor.Green;
                Console.Write($"\n  ✅ Result: {token.Text}");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.Status:
                Console.ForegroundColor = ConsoleColor.Cyan;
                Console.Write($"\n  ℹ️ {token.Text}");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.Error:
                Console.ForegroundColor = ConsoleColor.Red;
                Console.Write($"\n  ❌ {token.Text}");
                Console.ResetColor();
                break;
        }
    });

Console.WriteLine($"\n\n[Total stream tokens: {tokenCount}]");
Console.WriteLine($"[Final marker received: {detailedResult.Content.Length} chars]");

The AgentStreamTokenType enum distinguishes between:

Content: the visible response text
Thinking: internal reasoning tokens (when the model uses a think/reason mode)
ToolCall: when the agent invokes a tool (includes tool name and arguments)
ToolResult: the tool's return value
Status: orchestration events (agent starting, iteration count)
Error: error messages during execution

Step 4: Consume Streams with IAsyncEnumerable

For LINQ-style processing and await foreach patterns, use the StreamAsync extension method:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful research assistant with access to tools.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
        tools.Register(BuiltInTools.WebSearch);
    })
    .WithMaxIterations(5)
    .Build();

// ──────────────────────────────────────
// 5. IAsyncEnumerable streaming
// ──────────────────────────────────────
Console.WriteLine("\n── Async Enumerable Stream ──\n");

var contentBuilder = new StringBuilder();
int toolCallsSeen = 0;

await foreach (AgentStreamToken token in agent.StreamAsync(
    "Search the web for the current population of Tokyo and calculate what percentage it is of Japan's total population."))
{
    switch (token.Type)
    {
        case AgentStreamTokenType.Content:
            contentBuilder.Append(token.Text);
            Console.Write(token.Text);
            break;

        case AgentStreamTokenType.ToolCall:
            toolCallsSeen++;
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.Write($"\n  [Tool #{toolCallsSeen}] {token.Text}");
            Console.ResetColor();
            break;

        case AgentStreamTokenType.ToolResult:
            Console.ForegroundColor = ConsoleColor.Green;
            Console.Write($"\n  [Result] {token.Text}\n");
            Console.ResetColor();
            break;
    }
}

Console.WriteLine($"\n\nFinal content length: {contentBuilder.Length} chars");
Console.WriteLine($"Tool calls observed: {toolCallsSeen}");

The StreamAsync method returns an IAsyncEnumerable<AgentStreamToken> that you can filter, transform, or aggregate using standard async LINQ. For content-only streaming, use StreamContentAsync which yields plain strings:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful research assistant with access to tools.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
        tools.Register(BuiltInTools.WebSearch);
    })
    .WithMaxIterations(5)
    .Build();

// ──────────────────────────────────────
// 6. Content-only async enumerable
// ──────────────────────────────────────
Console.Write("\nAssistant: ");
await foreach (string text in agent.StreamContentAsync("What is the meaning of life?"))
{
    Console.Write(text);
}
Console.WriteLine();

Step 5: Build a Custom Stream Handler

For production applications, implement IAgentStreamHandler to build a structured streaming pipeline with lifecycle hooks:

using System.Diagnostics;
using System.Text;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful research assistant with access to tools.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
        tools.Register(BuiltInTools.WebSearch);
    })
    .WithMaxIterations(5)
    .Build();

// ──────────────────────────────────────
// 7. Custom stream handler
// ──────────────────────────────────────
var handler = new DashboardStreamHandler();

var dashboardResult = await agent.RunStreamingAsync(
    "Calculate the monthly payment for a $300,000 mortgage at 6.5% interest over 30 years.",
    handler);

The DashboardStreamHandler class implements IAgentStreamHandler with lifecycle hooks:

using System.Diagnostics;
using LMKit.Agents;
using LMKit.Agents.Streaming;

public sealed class DashboardStreamHandler : IAgentStreamHandler
{
    private readonly Stopwatch _stopwatch = new();
    private int _tokenCount;
    private int _toolCalls;
    private readonly List<string> _toolLog = new();

    public Task OnStartAsync(Agent agent, string input, CancellationToken ct)
    {
        _stopwatch.Start();
        Console.ForegroundColor = ConsoleColor.Cyan;
        Console.WriteLine($"╔══════════════════════════════════════╗");
        Console.WriteLine($"║  Agent: {agent.Identity.Persona,-28} ║");
        Console.WriteLine($"║  Input: {input[..Math.Min(28, input.Length)],-28} ║");
        Console.WriteLine($"╚══════════════════════════════════════╝");
        Console.ResetColor();
        return Task.CompletedTask;
    }

    public Task OnTokenAsync(AgentStreamToken token, CancellationToken ct)
    {
        _tokenCount++;

        switch (token.Type)
        {
            case AgentStreamTokenType.Content:
                Console.Write(token.Text);
                break;

            case AgentStreamTokenType.ToolCall:
                _toolCalls++;
                _toolLog.Add(token.Text);
                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.Write($"\n  ⚡ {token.Text}");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.ToolResult:
                Console.ForegroundColor = ConsoleColor.Green;
                Console.Write($" → {token.Text}\n");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.Thinking:
                // Silently collect thinking tokens (show summary at end)
                break;
        }

        return Task.CompletedTask;
    }

    public Task OnCompleteAsync(AgentExecutionResult result, CancellationToken ct)
    {
        _stopwatch.Stop();
        Console.WriteLine();
        Console.ForegroundColor = ConsoleColor.Cyan;
        Console.WriteLine($"┌──────────────────────────────────────┐");
        Console.WriteLine($"│  Completed in {_stopwatch.ElapsedMilliseconds,6}ms              │");
        Console.WriteLine($"│  Tokens streamed: {_tokenCount,5}               │");
        Console.WriteLine($"│  Tool calls: {_toolCalls,5}                    │");
        Console.WriteLine($"└──────────────────────────────────────┘");
        Console.ResetColor();
        return Task.CompletedTask;
    }

    public Task OnErrorAsync(Exception exception, CancellationToken ct)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"\n  ❌ Error: {exception.Message}");
        Console.ResetColor();
        return Task.CompletedTask;
    }
}

Use the handler with RunStreamingAsync:

var handler = new DashboardStreamHandler();

var dashboardResult = await agent.RunStreamingAsync(
    "Calculate the monthly payment for a $300,000 mortgage at 6.5% interest over 30 years.",
    handler);

Step 6: Use Built-In Console Handlers

LM-Kit.NET includes pre-built console handlers for quick development. RunStreamingToConsoleAsync prints content directly, with an optional verbose mode that shows all token types with color coding:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful research assistant with access to tools.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
        tools.Register(BuiltInTools.WebSearch);
    })
    .WithMaxIterations(5)
    .Build();

// ──────────────────────────────────────
// 8. Built-in console streaming
// ──────────────────────────────────────

// Standard mode: content only
Console.WriteLine("\n── Standard Console Stream ──\n");
await agent.RunStreamingToConsoleAsync("What are three benefits of local AI inference?",
    verbose: false);

// Verbose mode: shows tools, thinking, status with colors
Console.WriteLine("\n\n── Verbose Console Stream ──\n");
await agent.RunStreamingToConsoleAsync(
    "Search the web for the latest .NET 10 features and summarize them.",
    verbose: true);

You can also create the handlers directly for more control:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful research assistant with access to tools.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.CalcArithmetic);
        tools.Register(BuiltInTools.DateTimeNow);
        tools.Register(BuiltInTools.WebSearch);
    })
    .WithMaxIterations(5)
    .Build();

// DelegateStreamHandler with pre-built presets
var consoleHandler = DelegateStreamHandler.Console();          // Content only
var verboseHandler = DelegateStreamHandler.ConsoleVerbose();   // All types, colored

await agent.RunStreamingAsync("Quick test", consoleHandler);

Step 7: Streaming with Cancellation

Support user-initiated cancellation (pressing Escape, timeout, or external signal) using CancellationToken:

// ──────────────────────────────────────
// 9. Cancellable streaming
// ──────────────────────────────────────
Console.WriteLine("\n── Cancellable Stream (press Escape to stop) ──\n");

using var cts = new CancellationTokenSource();

// Background task: watch for Escape key
_ = Task.Run(() =>
{
    while (!cts.IsCancellationRequested)
    {
        if (Console.KeyAvailable && Console.ReadKey(true).Key == ConsoleKey.Escape)
        {
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine("\n  [Cancelled by user]");
            Console.ResetColor();
            cts.Cancel();
            break;
        }
        Thread.Sleep(50);
    }
});

try
{
    Console.Write("Assistant: ");
    await foreach (string text in agent.StreamContentAsync(
        "Write a detailed essay about the history of artificial intelligence.",
        cancellationToken: cts.Token))
    {
        Console.Write(text);
    }
}
catch (OperationCanceledException)
{
    Console.WriteLine("\n[Stream was cancelled gracefully]");
}

Common Issues

Problem	Cause	Fix
No tokens appear, then entire response at once	Using `ExecuteAsync` instead of streaming methods	Switch to `RunStreamingAsync`, `StreamAsync`, or `StreamContentAsync`
Thinking tokens not visible	Model does not use a thinking/reasoning mode	Set `ReasoningLevel` on the agent, or use a model that supports thinking tokens
Tool calls appear as garbled JSON	Consuming `ToolCall` tokens as plain content	Check `token.Type` and handle `ToolCall` separately from `Content`
`OnCompleteAsync` never fires	Agent hit `MaxIterations` or threw an exception	Check `OnErrorAsync` for exceptions; increase `MaxIterations` if the agent runs out of steps
Slow first-token latency	Model not warmed up, or prompt is very long	Run a short warmup prompt first; reduce system prompt length if possible

Next Steps

Build a Conversational Assistant with Memory: add multi-turn memory to your streaming agent.
Route Prompts Across Models with RouterOrchestrator: stream responses from dynamically routed agents.
Build a Resilient Production Agent: handle streaming errors with retry and fallback policies.
Control Token Sampling with Dynamic Strategies: adjust temperature and sampling to control streaming creativity.

Table of Contents