Add Middleware Filters to Agents and Conversations

LM-Kit.NET provides a FilterPipeline that lets you attach middleware-style filters to intercept three stages of text generation: prompts (before inference), completions (after inference), and tool invocations (during the tool-calling loop). Filters follow the ASP.NET Core onion pattern: code before await next(context) runs on the way in, code after runs on the way out. This guide walks through each filter type with practical examples for logging, prompt rewriting, caching, rate limiting, content moderation, and telemetry.

Why Middleware Filters

Two enterprise scenarios where filters provide cleaner solutions than ad-hoc event handlers:

Composable cross-cutting concerns. Logging, caching, rate limiting, and moderation are separate responsibilities that should be independently developed, tested, and stacked. Filters let you compose them as a pipeline without modifying application logic or tool implementations.
Prompt-level control. Unlike events, prompt filters can rewrite the input before inference or short-circuit entirely (e.g., returning a cached response). This enables patterns like semantic caching and input sanitization that are not possible with BeforeToolInvocation events alone.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	4+ GB
Disk	~3 GB free for model download

Step 1: Create the Project

dotnet new console -n FilterPipelineQuickstart
cd FilterPipelineQuickstart
dotnet add package LM-Kit.NET

Step 2: Add a Prompt Filter for Logging

A prompt filter intercepts every submission before it reaches the model. Use it for logging, PII redaction, prompt augmentation, or caching.

using System.Diagnostics;
using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create conversation with prompt filter
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant.",
    MaximumCompletionTokens = 512
};

var stopwatch = new Stopwatch();

chat.Filters = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) =>
    {
        // Before inference
        Console.WriteLine($"[LOG] Prompt: \"{ctx.Prompt}\"");
        Console.WriteLine($"[LOG] System prompt: \"{ctx.SystemPrompt}\"");
        Console.WriteLine($"[LOG] Is tool response: {ctx.IsToolResponse}");

        stopwatch.Restart();
        await next(ctx);  // run inference
        stopwatch.Stop();

        // After inference
        Console.WriteLine($"[LOG] Inference completed in {stopwatch.ElapsedMilliseconds}ms");
    });

chat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// ──────────────────────────────────────
// 3. Run a prompt
// ──────────────────────────────────────
Console.Write("Assistant: ");
chat.Submit("What is the middleware pattern?");
Console.WriteLine("\n");

The filter wraps the entire inference call. Code before next() runs first, then inference, then code after next().

Step 3: Rewrite Prompts Before Inference

Modify ctx.Prompt to change what the model sees without changing the caller's message:

chat.Filters = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) =>
    {
        // Append a constraint the model should follow
        ctx.Prompt = ctx.Prompt + "\n\nRespond in three sentences or fewer.";

        await next(ctx);
    });

Multiple prompt filters execute in onion order. If Filter A is added before Filter B:

Filter A (before) → Filter B (before) → Inference → Filter B (after) → Filter A (after)

Step 4: Short-Circuit with a Cached Response

If a prompt filter sets ctx.Result before calling next(), inference is skipped entirely. This is the foundation for semantic caching:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant.",
    MaximumCompletionTokens = 512
};

// Simple exact-match response cache
var responseCache = new Dictionary<string, TextGenerationResult>();

chat.Filters = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) =>
    {
        if (responseCache.TryGetValue(ctx.Prompt, out var cached))
        {
            Console.WriteLine("[CACHE HIT] Returning cached response.");
            ctx.Result = cached;  // short-circuit: inference is skipped
            return;
        }

        await next(ctx);

        // Store result for future lookups
        if (ctx.Result != null)
            responseCache[ctx.Prompt] = ctx.Result;
    });

chat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

Console.Write("First call: ");
chat.Submit("What is 2+2?");
Console.Write("\n\nSecond call (same prompt): ");
chat.Submit("What is 2+2?");
Console.WriteLine("\n");

Step 5: Add a Completion Filter for Telemetry

Completion filters run after inference produces a result. Use them for telemetry, quality gates, output transformation, or response caching:

chat.Filters = new FilterPipeline()
    .AddCompletionFilter(async (ctx, next) =>
    {
        await next(ctx);

        // ctx.Result is now populated
        if (ctx.Result != null)
        {
            Console.WriteLine($"[TELEMETRY] Tokens: {ctx.Result.GeneratedTokens.Count}");
            Console.WriteLine($"[TELEMETRY] Speed: {ctx.Result.TokenGenerationRate:F1} tok/s");
            Console.WriteLine($"[TELEMETRY] Quality: {ctx.Result.QualityScore:F2}");
        }
    });

The Properties dictionary is shared by reference between all filter contexts within a single request. Use it to pass data from prompt filters to completion filters:

chat.Filters = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) =>
    {
        ctx.Properties["requestId"] = Guid.NewGuid().ToString("N")[..8];
        ctx.Properties["startTimestamp"] = Stopwatch.GetTimestamp();
        Console.WriteLine($"[REQ {ctx.Properties["requestId"]}] Started");

        await next(ctx);
    })
    .AddCompletionFilter(async (ctx, next) =>
    {
        await next(ctx);

        var start = (long)ctx.Properties["startTimestamp"];
        double ms = (double)(Stopwatch.GetTimestamp() - start) / Stopwatch.Frequency * 1000;
        Console.WriteLine($"[REQ {ctx.Properties["requestId"]}] Completed in {ms:F0}ms");
    });

Step 7: Add Tool Invocation Filters

Tool invocation filters wrap each individual tool call during the automatic tool-calling loop. They execute after the permission policy evaluation but before the actual tool invocation.

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;
using LMKit.Agents.Tools;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant with tools.",
    MaximumCompletionTokens = 512
};

chat.Tools.Register(BuiltInTools.CalcArithmetic);
chat.Tools.Register(BuiltInTools.DateTimeNow);

chat.Filters = new FilterPipeline()
    .AddToolInvocationFilter(async (ctx, next) =>
    {
        Console.WriteLine($"\n  [TOOL] {ctx.ToolCall.Name}({ctx.ToolCall.ArgumentsJson})");
        Console.WriteLine($"  [TOOL] Batch: {ctx.ToolIndex + 1}/{ctx.ToolCount}, Request cycle: {ctx.RequestIndex}");

        await next(ctx);

        Console.WriteLine($"  [TOOL] Result: {ctx.Result?.ResultJson}");
    });

chat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

Console.Write("Assistant: ");
chat.Submit("What is 256 * 789? And what day is it today?");
Console.WriteLine("\n");

Tool Filter Context Properties

Property	Type	Description
`ToolCall`	`ToolCall`	The requested tool call (name and arguments)
`Tool`	`ITool`	The tool instance that will be invoked
`PermissionResult`	`ToolPermissionResult`	Result from permission policy evaluation
`RequestIndex`	`int`	Zero-based LLM request cycle index
`ToolIndex`	`int`	Position within the current batch of tool calls
`ToolCount`	`int`	Total tools in the current batch
`Cancel`	`bool`	Set to `true` to skip this tool's execution
`Result`	`ToolCallResult?`	Set to override the tool result (short-circuits execution)
`Terminate`	`bool`	Set to `true` to stop the model from requesting further tool calls

Step 8: Rate-Limit Tool Calls

Prevent runaway agents from making too many tool calls by counting invocations:

int toolCallCount = 0;
const int MaxToolCalls = 10;

pipeline.AddToolInvocationFilter(async (ctx, next) =>
{
    toolCallCount++;

    if (toolCallCount > MaxToolCalls)
    {
        Console.WriteLine($"[RATE LIMIT] Blocked call #{toolCallCount}");
        ctx.Cancel = true;
        return;
    }

    await next(ctx);
});

Step 9: Cache Tool Results

Avoid redundant tool invocations by caching results keyed on tool name and arguments:

var toolCache = new Dictionary<string, ToolCallResult>();

pipeline.AddToolInvocationFilter(async (ctx, next) =>
{
    string key = $"{ctx.ToolCall.Name}:{ctx.ToolCall.ArgumentsJson}";

    if (toolCache.TryGetValue(key, out var cached))
    {
        ctx.Result = cached;  // skip actual invocation
        return;
    }

    await next(ctx);

    if (ctx.Result != null)
        toolCache[key] = ctx.Result;
});

Step 10: Use Filters with Agents via AgentBuilder

The AgentBuilder.WithFilters() method accepts a FilterPipeline instance or an inline configuration callback:

using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;
using LMKit.TextGeneration.Filters;

// Option A: pass a pre-built pipeline
var pipeline = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) => { /* ... */ await next(ctx); })
    .AddToolInvocationFilter(async (ctx, next) => { /* ... */ await next(ctx); });

var agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful assistant.")
    .WithTools(tools => tools.Register(BuiltInTools.CalcArithmetic))
    .WithFilters(pipeline)
    .Build();

// Option B: configure inline
var agent2 = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful assistant.")
    .WithTools(tools => tools.Register(BuiltInTools.CalcArithmetic))
    .WithFilters(filters =>
    {
        filters.AddPromptFilter(async (ctx, next) =>
        {
            Console.WriteLine($"[Agent] Prompt: {ctx.Prompt}");
            await next(ctx);
        });
    })
    .Build();

Step 11: Implement Class-Based Filters

For reusable, testable filters, implement the filter interfaces directly:

using LMKit.TextGeneration.Filters;

public class AuditPromptFilter : IPromptFilter
{
    private readonly ILogger _logger;

    public AuditPromptFilter(ILogger logger)
    {
        _logger = logger;
    }

    public async Task OnPromptAsync(
        PromptFilterContext context,
        Func<PromptFilterContext, Task> next)
    {
        _logger.LogInformation("Prompt received: {Prompt}", context.Prompt);
        await next(context);
        _logger.LogInformation("Inference completed for prompt");
    }
}

// Register with pipeline
var pipeline = new FilterPipeline();
pipeline.PromptFilters.Add(new AuditPromptFilter(logger));

The three filter interfaces:

Interface	Method	When It Runs
`IPromptFilter`	`OnPromptAsync(PromptFilterContext, next)`	Before and after inference
`ICompletionFilter`	`OnCompletionAsync(CompletionFilterContext, next)`	After inference produces a result
`IToolInvocationFilter`	`OnToolInvocationAsync(ToolInvocationFilterContext, next)`	Around each individual tool call

Filters vs. Events

Filters and events are complementary. Filters execute first; events fire afterward. Use the approach that best fits your scenario:

Capability	Filters	Events
Modify prompt before inference	Yes (`ctx.Prompt = ...`)	No
Short-circuit inference	Yes (`ctx.Result = ...`)	No
Compose multiple middleware layers	Yes (onion pattern)	Limited (multiple handlers)
Share state across stages	Yes (`Properties` dictionary)	No built-in mechanism
Cancel a tool call	Yes (`ctx.Cancel = true`)	Yes (`e.Cancel = true`)
Override tool result	Yes (`ctx.Result = ...`)	No
Terminate tool-calling loop	Yes (`ctx.Terminate = true`)	No
Simple one-off logging	Either works	Simpler to set up

Common Issues

Problem	Cause	Fix
Filters not executing	Pipeline not assigned	Set `chat.Filters = pipeline` or use `AgentBuilder.WithFilters()`
Inference skipped unexpectedly	A prompt filter sets `ctx.Result` without calling `next()`	Check all prompt filters for unintended short-circuits
Tool filter does not see all calls	Filter added after `Submit()`	Add filters before submitting prompts
Properties empty in completion filter	Different `Dictionary` instances	Ensure the same `Properties` dictionary flows through both contexts (this is automatic when using `MultiTurnConversation.Filters`)
Filters fire but events do not	Not a bug	Filters and events are independent; attach event handlers separately if needed

Next Steps

Intercept and Control Tool Invocations: event-based tool interception (complementary to filters)
Secure Agent Tool Access with Permission Policies: combine with ToolPermissionPolicy for defense-in-depth
Create an AI Agent with Tools: build agents with the ITool interface
Build a Resilient Production Agent: error handling, retries, and observability for production agents

Table of Contents