Understanding Filters and Middleware in LM-Kit.NET

TL;DR

Filters (also called middleware) are composable interceptors that wrap the prompt, completion, and tool invocation stages of a conversation or agent. They follow an onion model: each filter can inspect and modify data on the way in, call the next filter, then inspect and modify the result on the way out. In LM-Kit.NET, the FilterPipeline class holds ordered lists of IPromptFilter, ICompletionFilter, and IToolInvocationFilter instances. Attach a pipeline to any MultiTurnConversation or Agent to add logging, content moderation, caching, PII redaction, telemetry, or custom guardrails without modifying core inference logic.

What are Filters?

Definition: A filter is a function that intercepts a specific stage of the LLM request lifecycle. Filters receive a context object and a next delegate. Calling next passes control to the next filter in the chain (and ultimately to the inference engine). Code before next runs on the way in; code after next runs on the way out.

User prompt
   |
   v
+--------------------+
| Prompt Filter 1    |  --> can rewrite prompt, short-circuit, log
|   Prompt Filter 2  |  --> can add RAG context, redact PII
|     [Inference]    |  --> LLM generates completion
|   Completion Flt 2 |  --> can validate output, cache result
| Completion Flt 1   |  --> can moderate content, log metrics
+--------------------+
   |
   v
Response to user

Tool calls (if any):
+--------------------+
| Tool Inv. Filter 1 |  --> can rate-limit, log, override result
|   Tool Inv. Flt 2  |  --> can cancel execution, cache
|     [Tool Exec]    |  --> actual tool runs
|   Tool Inv. Flt 2  |  --> inspect result, transform
| Tool Inv. Filter 1 |  --> audit trail
+--------------------+

This pattern is sometimes called middleware because it resembles the HTTP middleware pipeline found in ASP.NET Core and similar frameworks.

Why Use Filters?

Separation of Concerns: Keep cross-cutting behavior (logging, moderation, caching) out of your business logic. Each filter does one job.
Composability: Stack filters in any order. Add or remove a filter without touching other code.
Short-Circuiting: A prompt filter can return a cached response without ever calling inference. A tool filter can block a dangerous tool call without executing it.
Observability: Wrap every LLM call with timing, token counting, and cost tracking at a single interception point.
Safety and Compliance: Enforce guardrails as filters: PII redaction before inference, content moderation after inference, and approval gates around tool calls.

The Three Filter Types

IPromptFilter

Intercepts the prompt before it reaches the model and the result after inference completes.

Common uses:

Content moderation and prompt injection detection
PII redaction (strip sensitive data before sending to the model)
RAG context injection (append retrieved documents to the prompt)
Semantic caching (return a cached answer for similar prompts)
Logging and telemetry

Context properties (PromptFilterContext):

Prompt (read/write): the user's prompt text, rewritable by the filter
ChatHistory: the full conversation history
SystemPrompt: the configured system prompt
IsToolResponse: whether this prompt is a tool-result re-submission
Properties: shared state bag across all filters in the pipeline
Result (write): set to a TextGenerationResult to short-circuit inference entirely

ICompletionFilter

Intercepts the model's completion after inference produces a result.

Common uses:

Output quality validation (reject low-confidence responses)
Content moderation and safety filtering
Response transformation (formatting, translation, redaction)
Response caching for future similar prompts
Token usage tracking and cost accounting

Context properties (CompletionFilterContext):

Prompt: the prompt as submitted (after any rewriting by prompt filters)
Result (read/write): the completion, replaceable by the filter
ChatHistory: the full conversation history
Properties: shared state bag (same instance as the prompt filter context)

IToolInvocationFilter

Intercepts each individual tool call during the automatic tool-calling loop, after the permission policy has been evaluated.

Common uses:

Logging and audit trails for every tool invocation
Rate limiting (deny if too many calls per session)
Result caching (return a cached tool result without re-executing)
Error recovery (catch exceptions and provide fallback results)
Early termination (stop the tool-calling loop after a condition is met)

Context properties (ToolInvocationFilterContext):

ToolCall: the tool call requested by the model
Tool: the tool instance that will be invoked
PermissionResult: the decision from the permission policy
RequestIndex / ToolIndex / ToolCount: position metadata
Cancel (write): skip this tool's execution
Result (write): override the tool result without executing
Terminate (write): stop the tool-calling loop after the current batch

Code Example

Class-Based Filters

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Filters;

// A prompt filter that logs every user prompt
public class LoggingPromptFilter : IPromptFilter
{
    public async Task OnPromptAsync(PromptFilterContext context,
        Func<PromptFilterContext, Task> next)
    {
        Console.WriteLine($"[LOG] Prompt: {context.Prompt[..Math.Min(80, context.Prompt.Length)]}...");
        var sw = System.Diagnostics.Stopwatch.StartNew();

        await next(context); // Call the next filter (or inference)

        sw.Stop();
        Console.WriteLine($"[LOG] Completed in {sw.ElapsedMilliseconds}ms");
    }
}

// A completion filter that blocks unsafe output
public class ModerationCompletionFilter : ICompletionFilter
{
    public async Task OnCompletionAsync(CompletionFilterContext context,
        Func<CompletionFilterContext, Task> next)
    {
        await next(context);

        if (ContainsUnsafeContent(context.Result.TextContent))
        {
            context.Result = new TextGenerationResult("I cannot provide that response.");
        }
    }

    private bool ContainsUnsafeContent(string text) => /* moderation logic */;
}

// Wire up filters on a conversation
var model = LM.LoadFromModelID("gemma4:e4b");
using var chat = new MultiTurnConversation(model);

chat.Filters.PromptFilters.Add(new LoggingPromptFilter());
chat.Filters.CompletionFilters.Add(new ModerationCompletionFilter());

Inline Lambda Filters

using LMKit.Model;
using LMKit.Agents;
using LMKit.TextGeneration.Filters;

var model = LM.LoadFromModelID("glm4.7-flash");

// Build an agent with inline filters using extension methods
var agent = Agent.CreateBuilder(model)
    .WithFilters(filters =>
    {
        // Prompt filter: inject a reminder into every prompt
        filters.AddPromptFilter(async (context, next) =>
        {
            context.Prompt = context.Prompt + "\n\nRemember to cite your sources.";
            await next(context);
        });

        // Completion filter: track token usage
        filters.AddCompletionFilter(async (context, next) =>
        {
            await next(context);
            Console.WriteLine($"Tokens used: {context.Result.TokensUsed}");
        });

        // Tool invocation filter: log every tool call
        filters.AddToolInvocationFilter(async (context, next) =>
        {
            Console.WriteLine($"Tool: {context.ToolCall.Name} [{context.ToolIndex + 1}/{context.ToolCount}]");
            await next(context);
            Console.WriteLine($"Result: {context.Result?.Content?[..Math.Min(100, context.Result.Content.Length)]}");
        });
    })
    .Build();

Short-Circuiting with Semantic Caching

using LMKit.TextGeneration.Filters;

public class SemanticCacheFilter : IPromptFilter
{
    private readonly Dictionary<string, TextGenerationResult> _cache = new();

    public async Task OnPromptAsync(PromptFilterContext context,
        Func<PromptFilterContext, Task> next)
    {
        // Check cache before calling inference
        if (_cache.TryGetValue(context.Prompt, out var cached))
        {
            context.Result = cached; // Short-circuit: skip inference entirely
            return;
        }

        await next(context); // No cache hit, run inference

        // Cache the result for future calls
        _cache[context.Prompt] = context.Result;
    }
}

Filter Execution Order

Filters execute in the order they are added to the pipeline. For prompt filters, the first added filter is the outermost layer; for completion filters, the first added filter also wraps outermost. This means:

Stage	Execution Order
Prompt (inbound)	Filter 1 → Filter 2 → Filter 3 → Inference
Completion (outbound)	Filter 3 → Filter 2 → Filter 1 → Response
Tool invocation	Filter 1 → Filter 2 → Tool Execution → Filter 2 → Filter 1

The Properties dictionary on each context object is shared across all filters in a single request, enabling cross-filter communication (e.g., a prompt filter records a timer that a completion filter reads).

Key Terms

Filter Pipeline: An ordered collection of filters (FilterPipeline) that intercepts the three stages of LLM interaction.
Prompt Filter: Intercepts before and after inference, can rewrite prompts or short-circuit with cached results.
Completion Filter: Intercepts after inference, can validate, transform, or replace the model's output.
Tool Invocation Filter: Intercepts each tool call, can log, cancel, cache, or terminate the tool-calling loop.
Short-Circuit: Returning a result from a filter without calling next, skipping all subsequent filters and inference.
Onion Model: The nested execution pattern where each filter wraps the next, with inbound and outbound phases.
Cross-Filter State: The shared Properties dictionary that allows filters to pass data to each other.

FilterPipeline: Holds ordered lists of prompt, completion, and tool invocation filters
IPromptFilter: Interface for prompt interception
ICompletionFilter: Interface for completion interception
IToolInvocationFilter: Interface for tool call interception
PromptFilterContext: Context for prompt filters
CompletionFilterContext: Context for completion filters
ToolInvocationFilterContext: Context for tool invocation filters
MultiTurnConversation: Conversation class with Filters property

AI Agent Guardrails: Filters are a primary mechanism for implementing guardrails
AI Agents: Agents support filters through AgentBuilder.WithFilters()
AI Agent Tools: Tool invocation filters intercept tool calls
Tool Permission Policies: Permission policies evaluate before tool invocation filters run
Function Calling: The tool-calling loop that tool invocation filters intercept
AI Agent Execution: The runtime lifecycle that filters wrap
Prompt Engineering: Prompt filters can augment and transform prompts automatically

External Resources

ASP.NET Core Middleware Documentation: The middleware pattern that inspired LM-Kit.NET filters
Semantic Kernel Filters: Microsoft's analogous filter system for comparison
LM-Kit Filter Pipeline Demo: Working sample demonstrating all three filter types
Add Middleware Filters (How-To): Step-by-step integration guide

Summary

Filters and middleware provide a clean, composable way to intercept every stage of the LLM interaction lifecycle in LM-Kit.NET. The FilterPipeline class organizes three types of filters: IPromptFilter for rewriting or caching prompts before inference, ICompletionFilter for validating or transforming outputs after inference, and IToolInvocationFilter for logging, rate-limiting, or canceling tool calls. Filters follow an onion execution model, can short-circuit the pipeline, and share state across stages. They attach to both MultiTurnConversation and Agent instances, making them the recommended approach for adding cross-cutting concerns like observability, moderation, and compliance to any LM-Kit.NET application.

Table of Contents

Understanding Filters and Middleware in LM-Kit.NET

TL;DR

What are Filters?

Why Use Filters?

The Three Filter Types

IPromptFilter

ICompletionFilter

IToolInvocationFilter

Code Example

Class-Based Filters

Inline Lambda Filters

Short-Circuiting with Semantic Caching

Filter Execution Order

Key Terms

Related API Documentation

Related Glossary Topics

External Resources

Summary