Table of Contents

Add Middleware Filters to Agents and Conversations

LM-Kit.NET provides a FilterPipeline that lets you attach middleware-style filters to intercept three stages of text generation: prompts (before inference), completions (after inference), and tool invocations (during the tool-calling loop). Filters follow the ASP.NET Core onion pattern: code before await next(context) runs on the way in, code after runs on the way out. This guide walks through each filter type with practical examples for logging, prompt rewriting, caching, rate limiting, content moderation, and telemetry.


Why Middleware Filters

Two enterprise scenarios where filters provide cleaner solutions than ad-hoc event handlers:

  1. Composable cross-cutting concerns. Logging, caching, rate limiting, and moderation are separate responsibilities that should be independently developed, tested, and stacked. Filters let you compose them as a pipeline without modifying application logic or tool implementations.
  2. Prompt-level control. Unlike events, prompt filters can rewrite the input before inference or short-circuit entirely (e.g., returning a cached response). This enables patterns like semantic caching and input sanitization that are not possible with BeforeToolInvocation events alone.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 4+ GB
Disk ~3 GB free for model download

Step 1: Create the Project

dotnet new console -n FilterPipelineQuickstart
cd FilterPipelineQuickstart
dotnet add package LM-Kit.NET

Step 2: Add a Prompt Filter for Logging

A prompt filter intercepts every submission before it reaches the model. Use it for logging, PII redaction, prompt augmentation, or caching.

using System.Diagnostics;
using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create conversation with prompt filter
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant.",
    MaximumCompletionTokens = 512
};

var stopwatch = new Stopwatch();

chat.Filters = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) =>
    {
        // Before inference
        Console.WriteLine($"[LOG] Prompt: \"{ctx.Prompt}\"");
        Console.WriteLine($"[LOG] System prompt: \"{ctx.SystemPrompt}\"");
        Console.WriteLine($"[LOG] Is tool response: {ctx.IsToolResponse}");

        stopwatch.Restart();
        await next(ctx);  // run inference
        stopwatch.Stop();

        // After inference
        Console.WriteLine($"[LOG] Inference completed in {stopwatch.ElapsedMilliseconds}ms");
    });

chat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

// ──────────────────────────────────────
// 3. Run a prompt
// ──────────────────────────────────────
Console.Write("Assistant: ");
chat.Submit("What is the middleware pattern?");
Console.WriteLine("\n");

The filter wraps the entire inference call. Code before next() runs first, then inference, then code after next().


Step 3: Rewrite Prompts Before Inference

Modify ctx.Prompt to change what the model sees without changing the caller's message:

chat.Filters = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) =>
    {
        // Append a constraint the model should follow
        ctx.Prompt = ctx.Prompt + "\n\nRespond in three sentences or fewer.";

        await next(ctx);
    });

Multiple prompt filters execute in onion order. If Filter A is added before Filter B:

Filter A (before) → Filter B (before) → Inference → Filter B (after) → Filter A (after)

Step 4: Short-Circuit with a Cached Response

If a prompt filter sets ctx.Result before calling next(), inference is skipped entirely. This is the foundation for semantic caching:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant.",
    MaximumCompletionTokens = 512
};

// Simple exact-match response cache
var responseCache = new Dictionary<string, TextGenerationResult>();

chat.Filters = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) =>
    {
        if (responseCache.TryGetValue(ctx.Prompt, out var cached))
        {
            Console.WriteLine("[CACHE HIT] Returning cached response.");
            ctx.Result = cached;  // short-circuit: inference is skipped
            return;
        }

        await next(ctx);

        // Store result for future lookups
        if (ctx.Result != null)
            responseCache[ctx.Prompt] = ctx.Result;
    });

chat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

Console.Write("First call: ");
chat.Submit("What is 2+2?");
Console.Write("\n\nSecond call (same prompt): ");
chat.Submit("What is 2+2?");
Console.WriteLine("\n");

Step 5: Add a Completion Filter for Telemetry

Completion filters run after inference produces a result. Use them for telemetry, quality gates, output transformation, or response caching:

chat.Filters = new FilterPipeline()
    .AddCompletionFilter(async (ctx, next) =>
    {
        await next(ctx);

        // ctx.Result is now populated
        if (ctx.Result != null)
        {
            Console.WriteLine($"[TELEMETRY] Tokens: {ctx.Result.GeneratedTokens.Count}");
            Console.WriteLine($"[TELEMETRY] Speed: {ctx.Result.TokenGenerationRate:F1} tok/s");
            Console.WriteLine($"[TELEMETRY] Quality: {ctx.Result.QualityScore:F2}");
        }
    });

Step 6: Share State Between Prompt and Completion Filters

The Properties dictionary is shared by reference between all filter contexts within a single request. Use it to pass data from prompt filters to completion filters:

chat.Filters = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) =>
    {
        ctx.Properties["requestId"] = Guid.NewGuid().ToString("N")[..8];
        ctx.Properties["startTimestamp"] = Stopwatch.GetTimestamp();
        Console.WriteLine($"[REQ {ctx.Properties["requestId"]}] Started");

        await next(ctx);
    })
    .AddCompletionFilter(async (ctx, next) =>
    {
        await next(ctx);

        var start = (long)ctx.Properties["startTimestamp"];
        double ms = (double)(Stopwatch.GetTimestamp() - start) / Stopwatch.Frequency * 1000;
        Console.WriteLine($"[REQ {ctx.Properties["requestId"]}] Completed in {ms:F0}ms");
    });

Step 7: Add Tool Invocation Filters

Tool invocation filters wrap each individual tool call during the automatic tool-calling loop. They execute after the permission policy evaluation but before the actual tool invocation.

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;
using LMKit.Agents.Tools;
using LMKit.Agents.Tools.BuiltIn;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant with tools.",
    MaximumCompletionTokens = 512
};

chat.Tools.Register(BuiltInTools.CalcArithmetic);
chat.Tools.Register(BuiltInTools.DateTimeNow);

chat.Filters = new FilterPipeline()
    .AddToolInvocationFilter(async (ctx, next) =>
    {
        Console.WriteLine($"\n  [TOOL] {ctx.ToolCall.Name}({ctx.ToolCall.ArgumentsJson})");
        Console.WriteLine($"  [TOOL] Batch: {ctx.ToolIndex + 1}/{ctx.ToolCount}, Request cycle: {ctx.RequestIndex}");

        await next(ctx);

        Console.WriteLine($"  [TOOL] Result: {ctx.Result?.ResultJson}");
    });

chat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

Console.Write("Assistant: ");
chat.Submit("What is 256 * 789? And what day is it today?");
Console.WriteLine("\n");

Tool Filter Context Properties

Property Type Description
ToolCall ToolCall The requested tool call (name and arguments)
Tool ITool The tool instance that will be invoked
PermissionResult ToolPermissionResult Result from permission policy evaluation
RequestIndex int Zero-based LLM request cycle index
ToolIndex int Position within the current batch of tool calls
ToolCount int Total tools in the current batch
Cancel bool Set to true to skip this tool's execution
Result ToolCallResult? Set to override the tool result (short-circuits execution)
Terminate bool Set to true to stop the model from requesting further tool calls

Step 8: Rate-Limit Tool Calls

Prevent runaway agents from making too many tool calls by counting invocations:

int toolCallCount = 0;
const int MaxToolCalls = 10;

pipeline.AddToolInvocationFilter(async (ctx, next) =>
{
    toolCallCount++;

    if (toolCallCount > MaxToolCalls)
    {
        Console.WriteLine($"[RATE LIMIT] Blocked call #{toolCallCount}");
        ctx.Cancel = true;
        return;
    }

    await next(ctx);
});

Step 9: Cache Tool Results

Avoid redundant tool invocations by caching results keyed on tool name and arguments:

var toolCache = new Dictionary<string, ToolCallResult>();

pipeline.AddToolInvocationFilter(async (ctx, next) =>
{
    string key = $"{ctx.ToolCall.Name}:{ctx.ToolCall.ArgumentsJson}";

    if (toolCache.TryGetValue(key, out var cached))
    {
        ctx.Result = cached;  // skip actual invocation
        return;
    }

    await next(ctx);

    if (ctx.Result != null)
        toolCache[key] = ctx.Result;
});

Step 10: Use Filters with Agents via AgentBuilder

The AgentBuilder.WithFilters() method accepts a FilterPipeline instance or an inline configuration callback:

using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;
using LMKit.TextGeneration.Filters;

// Option A: pass a pre-built pipeline
var pipeline = new FilterPipeline()
    .AddPromptFilter(async (ctx, next) => { /* ... */ await next(ctx); })
    .AddToolInvocationFilter(async (ctx, next) => { /* ... */ await next(ctx); });

var agent = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful assistant.")
    .WithTools(tools => tools.Register(BuiltInTools.CalcArithmetic))
    .WithFilters(pipeline)
    .Build();

// Option B: configure inline
var agent2 = Agent.CreateBuilder(model)
    .WithInstruction("You are a helpful assistant.")
    .WithTools(tools => tools.Register(BuiltInTools.CalcArithmetic))
    .WithFilters(filters =>
    {
        filters.AddPromptFilter(async (ctx, next) =>
        {
            Console.WriteLine($"[Agent] Prompt: {ctx.Prompt}");
            await next(ctx);
        });
    })
    .Build();

Step 11: Implement Class-Based Filters

For reusable, testable filters, implement the filter interfaces directly:

using LMKit.TextGeneration.Filters;

public class AuditPromptFilter : IPromptFilter
{
    private readonly ILogger _logger;

    public AuditPromptFilter(ILogger logger)
    {
        _logger = logger;
    }

    public async Task OnPromptAsync(
        PromptFilterContext context,
        Func<PromptFilterContext, Task> next)
    {
        _logger.LogInformation("Prompt received: {Prompt}", context.Prompt);
        await next(context);
        _logger.LogInformation("Inference completed for prompt");
    }
}

// Register with pipeline
var pipeline = new FilterPipeline();
pipeline.PromptFilters.Add(new AuditPromptFilter(logger));

The three filter interfaces:

Interface Method When It Runs
IPromptFilter OnPromptAsync(PromptFilterContext, next) Before and after inference
ICompletionFilter OnCompletionAsync(CompletionFilterContext, next) After inference produces a result
IToolInvocationFilter OnToolInvocationAsync(ToolInvocationFilterContext, next) Around each individual tool call

Filters vs. Events

Filters and events are complementary. Filters execute first; events fire afterward. Use the approach that best fits your scenario:

Capability Filters Events
Modify prompt before inference Yes (ctx.Prompt = ...) No
Short-circuit inference Yes (ctx.Result = ...) No
Compose multiple middleware layers Yes (onion pattern) Limited (multiple handlers)
Share state across stages Yes (Properties dictionary) No built-in mechanism
Cancel a tool call Yes (ctx.Cancel = true) Yes (e.Cancel = true)
Override tool result Yes (ctx.Result = ...) No
Terminate tool-calling loop Yes (ctx.Terminate = true) No
Simple one-off logging Either works Simpler to set up

Common Issues

Problem Cause Fix
Filters not executing Pipeline not assigned Set chat.Filters = pipeline or use AgentBuilder.WithFilters()
Inference skipped unexpectedly A prompt filter sets ctx.Result without calling next() Check all prompt filters for unintended short-circuits
Tool filter does not see all calls Filter added after Submit() Add filters before submitting prompts
Properties empty in completion filter Different Dictionary instances Ensure the same Properties dictionary flows through both contexts (this is automatic when using MultiTurnConversation.Filters)
Filters fire but events do not Not a bug Filters and events are independent; attach event handlers separately if needed

Next Steps