Table of Contents

Stream Agent Responses in Real Time

Waiting for an agent to finish its entire reasoning chain before showing anything to the user creates a poor experience. Users see a blank screen for seconds (or minutes for complex tasks), then a wall of text appears all at once. LM-Kit.NET's agent streaming API delivers tokens as they are generated, gives you visibility into tool calls and thinking steps as they happen, and supports multiple consumption patterns: callbacks, async enumerables, and custom stream handlers. This guide builds a fully streaming agent system with live token display, thinking visibility, tool call tracking, and a custom dashboard handler.


Why Real-Time Streaming Matters

Two production problems that streaming solves:

  1. Perceived latency in user-facing applications. When an agent takes 8 seconds to generate a response, the user sees nothing for 8 seconds. With streaming, the first token appears in under 200ms and the response builds progressively. Users perceive the system as fast and responsive even when total generation time is unchanged.
  2. Observability into agentic reasoning. Complex agents make tool calls, think through problems, and iterate. Without streaming, you get only the final answer. With streaming, you can watch each reasoning step, see which tools were called, track thinking tokens, and detect problems in real time rather than debugging after the fact.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 4+ GB
Disk ~3 GB free for model download

Step 1: Create the Project

dotnet new console -n AgentStreaming
cd AgentStreaming
dotnet add package LM-Kit.NET

Step 2: Stream with a Simple Content Callback

The simplest streaming approach uses RunStreamingAsync with a string callback. You receive only the content text, one chunk at a time:

using System.Text;
using LMKit.Model;
using LMKit.Agents;
using LMKit.Agents.Streaming;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create an agent with tools
// ──────────────────────────────────────
Agent agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful research assistant with access to tools.")
    .WithTools(tools =>
    {
        tools.Register(BuiltInTools.Calculator);
        tools.Register(BuiltInTools.DateTime);
        tools.Register(BuiltInTools.WebSearch);
    })
    .WithMaxIterations(5)
    .Build();

// ──────────────────────────────────────
// 3. Stream content with a simple callback
// ──────────────────────────────────────
Console.Write("Assistant: ");
AgentExecutionResult result = await agent.RunStreamingAsync(
    "What is 15% of $2,340? Also, what day of the week is today?",
    onContent: text => Console.Write(text));

Console.WriteLine($"\n\n[Tokens used: {result.TokenUsage}]");

The onContent callback fires for every text chunk the model generates. The response builds character by character on screen.


Step 3: Stream with Full Token Metadata

For richer visibility, use the AgentStreamToken callback. Each token carries metadata including its type (content, thinking, tool call, tool result, status):

// ──────────────────────────────────────
// 4. Token-level streaming with metadata
// ──────────────────────────────────────
Console.WriteLine("\n── Token-Level Streaming ──\n");

int tokenCount = 0;

AgentExecutionResult detailedResult = await agent.RunStreamingAsync(
    "Explain the concept of compound interest and calculate the result of $10,000 at 5% for 10 years.",
    onToken: token =>
    {
        tokenCount++;

        switch (token.Type)
        {
            case AgentStreamTokenType.Content:
                Console.Write(token.Text);
                break;

            case AgentStreamTokenType.Thinking:
                Console.ForegroundColor = ConsoleColor.DarkGray;
                Console.Write(token.Text);
                Console.ResetColor();
                break;

            case AgentStreamTokenType.ToolCall:
                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.Write($"\n  🔧 Tool call: {token.Text}");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.ToolResult:
                Console.ForegroundColor = ConsoleColor.Green;
                Console.Write($"\n  ✅ Result: {token.Text}");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.Status:
                Console.ForegroundColor = ConsoleColor.Cyan;
                Console.Write($"\n  ℹ️ {token.Text}");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.Error:
                Console.ForegroundColor = ConsoleColor.Red;
                Console.Write($"\n  ❌ {token.Text}");
                Console.ResetColor();
                break;
        }
    });

Console.WriteLine($"\n\n[Total stream tokens: {tokenCount}]");
Console.WriteLine($"[Final marker received: {detailedResult.Content.Length} chars]");

The AgentStreamTokenType enum distinguishes between:

  • Content: the visible response text
  • Thinking: internal reasoning tokens (when the model uses a think/reason mode)
  • ToolCall: when the agent invokes a tool (includes tool name and arguments)
  • ToolResult: the tool's return value
  • Status: orchestration events (agent starting, iteration count)
  • Error: error messages during execution

Step 4: Consume Streams with IAsyncEnumerable

For LINQ-style processing and await foreach patterns, use the StreamAsync extension method:

// ──────────────────────────────────────
// 5. IAsyncEnumerable streaming
// ──────────────────────────────────────
Console.WriteLine("\n── Async Enumerable Stream ──\n");

var contentBuilder = new StringBuilder();
int toolCallsSeen = 0;

await foreach (AgentStreamToken token in agent.StreamAsync(
    "Search the web for the current population of Tokyo and calculate what percentage it is of Japan's total population."))
{
    switch (token.Type)
    {
        case AgentStreamTokenType.Content:
            contentBuilder.Append(token.Text);
            Console.Write(token.Text);
            break;

        case AgentStreamTokenType.ToolCall:
            toolCallsSeen++;
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.Write($"\n  [Tool #{toolCallsSeen}] {token.Text}");
            Console.ResetColor();
            break;

        case AgentStreamTokenType.ToolResult:
            Console.ForegroundColor = ConsoleColor.Green;
            Console.Write($"\n  [Result] {token.Text}\n");
            Console.ResetColor();
            break;
    }
}

Console.WriteLine($"\n\nFinal content length: {contentBuilder.Length} chars");
Console.WriteLine($"Tool calls observed: {toolCallsSeen}");

The StreamAsync method returns an IAsyncEnumerable<AgentStreamToken> that you can filter, transform, or aggregate using standard async LINQ. For content-only streaming, use StreamContentAsync which yields plain strings:

// ──────────────────────────────────────
// 6. Content-only async enumerable
// ──────────────────────────────────────
Console.Write("\nAssistant: ");
await foreach (string text in agent.StreamContentAsync("What is the meaning of life?"))
{
    Console.Write(text);
}
Console.WriteLine();

Step 5: Build a Custom Stream Handler

For production applications, implement IAgentStreamHandler to build a structured streaming pipeline with lifecycle hooks:

// ──────────────────────────────────────
// 7. Custom stream handler
// ──────────────────────────────────────
public sealed class DashboardStreamHandler : IAgentStreamHandler
{
    private readonly Stopwatch _stopwatch = new();
    private int _tokenCount;
    private int _toolCalls;
    private readonly List<string> _toolLog = new();

    public Task OnStartAsync(Agent agent, string input, CancellationToken ct)
    {
        _stopwatch.Start();
        Console.ForegroundColor = ConsoleColor.Cyan;
        Console.WriteLine($"╔══════════════════════════════════════╗");
        Console.WriteLine($"║  Agent: {agent.Name,-28} ║");
        Console.WriteLine($"║  Input: {input[..Math.Min(28, input.Length)],-28} ║");
        Console.WriteLine($"╚══════════════════════════════════════╝");
        Console.ResetColor();
        return Task.CompletedTask;
    }

    public Task OnTokenAsync(AgentStreamToken token, CancellationToken ct)
    {
        _tokenCount++;

        switch (token.Type)
        {
            case AgentStreamTokenType.Content:
                Console.Write(token.Text);
                break;

            case AgentStreamTokenType.ToolCall:
                _toolCalls++;
                _toolLog.Add(token.Text);
                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.Write($"\n  ⚡ {token.Text}");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.ToolResult:
                Console.ForegroundColor = ConsoleColor.Green;
                Console.Write($" → {token.Text}\n");
                Console.ResetColor();
                break;

            case AgentStreamTokenType.Thinking:
                // Silently collect thinking tokens (show summary at end)
                break;
        }

        return Task.CompletedTask;
    }

    public Task OnCompleteAsync(AgentExecutionResult result, CancellationToken ct)
    {
        _stopwatch.Stop();
        Console.WriteLine();
        Console.ForegroundColor = ConsoleColor.Cyan;
        Console.WriteLine($"┌──────────────────────────────────────┐");
        Console.WriteLine($"│  Completed in {_stopwatch.ElapsedMilliseconds,6}ms              │");
        Console.WriteLine($"│  Tokens streamed: {_tokenCount,5}               │");
        Console.WriteLine($"│  Tool calls: {_toolCalls,5}                    │");
        Console.WriteLine($"└──────────────────────────────────────┘");
        Console.ResetColor();
        return Task.CompletedTask;
    }

    public Task OnErrorAsync(Exception exception, CancellationToken ct)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"\n  ❌ Error: {exception.Message}");
        Console.ResetColor();
        return Task.CompletedTask;
    }
}

Use the handler with RunStreamingAsync:

using System.Diagnostics;

var handler = new DashboardStreamHandler();

var dashboardResult = await agent.RunStreamingAsync(
    "Calculate the monthly payment for a $300,000 mortgage at 6.5% interest over 30 years.",
    handler);

Step 6: Use Built-In Console Handlers

LM-Kit.NET includes pre-built console handlers for quick development. RunStreamingToConsoleAsync prints content directly, with an optional verbose mode that shows all token types with color coding:

// ──────────────────────────────────────
// 8. Built-in console streaming
// ──────────────────────────────────────

// Standard mode: content only
Console.WriteLine("\n── Standard Console Stream ──\n");
await agent.RunStreamingToConsoleAsync("What are three benefits of local AI inference?",
    verbose: false);

// Verbose mode: shows tools, thinking, status with colors
Console.WriteLine("\n\n── Verbose Console Stream ──\n");
await agent.RunStreamingToConsoleAsync(
    "Search the web for the latest .NET 10 features and summarize them.",
    verbose: true);

You can also create the handlers directly for more control:

// DelegateStreamHandler with pre-built presets
var consoleHandler = DelegateStreamHandler.Console();          // Content only
var verboseHandler = DelegateStreamHandler.ConsoleVerbose();   // All types, colored

await agent.RunStreamingAsync("Quick test", consoleHandler);

Step 7: Streaming with Cancellation

Support user-initiated cancellation (pressing Escape, timeout, or external signal) using CancellationToken:

// ──────────────────────────────────────
// 9. Cancellable streaming
// ──────────────────────────────────────
Console.WriteLine("\n── Cancellable Stream (press Escape to stop) ──\n");

using var cts = new CancellationTokenSource();

// Background task: watch for Escape key
_ = Task.Run(() =>
{
    while (!cts.IsCancellationRequested)
    {
        if (Console.KeyAvailable && Console.ReadKey(true).Key == ConsoleKey.Escape)
        {
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine("\n  [Cancelled by user]");
            Console.ResetColor();
            cts.Cancel();
            break;
        }
        Thread.Sleep(50);
    }
});

try
{
    Console.Write("Assistant: ");
    await foreach (string text in agent.StreamContentAsync(
        "Write a detailed essay about the history of artificial intelligence.",
        cancellationToken: cts.Token))
    {
        Console.Write(text);
    }
}
catch (OperationCanceledException)
{
    Console.WriteLine("\n[Stream was cancelled gracefully]");
}

Common Issues

Problem Cause Fix
No tokens appear, then entire response at once Using ExecuteAsync instead of streaming methods Switch to RunStreamingAsync, StreamAsync, or StreamContentAsync
Thinking tokens not visible Model does not use a thinking/reasoning mode Set ReasoningLevel on the agent, or use a model that supports thinking tokens
Tool calls appear as garbled JSON Consuming ToolCall tokens as plain content Check token.Type and handle ToolCall separately from Content
OnCompleteAsync never fires Agent hit MaxIterations or threw an exception Check OnErrorAsync for exceptions; increase MaxIterations if the agent runs out of steps
Slow first-token latency Model not warmed up, or prompt is very long Run a short warmup prompt first; reduce system prompt length if possible

Next Steps