Table of Contents

Understanding Filters and Middleware in LM-Kit.NET


TL;DR

Filters (also called middleware) are composable interceptors that wrap the prompt, completion, and tool invocation stages of a conversation or agent. They follow an onion model: each filter can inspect and modify data on the way in, call the next filter, then inspect and modify the result on the way out. In LM-Kit.NET, the FilterPipeline class holds ordered lists of IPromptFilter, ICompletionFilter, and IToolInvocationFilter instances. Attach a pipeline to any MultiTurnConversation or Agent to add logging, content moderation, caching, PII redaction, telemetry, or custom guardrails without modifying core inference logic.


What are Filters?

Definition: A filter is a function that intercepts a specific stage of the LLM request lifecycle. Filters receive a context object and a next delegate. Calling next passes control to the next filter in the chain (and ultimately to the inference engine). Code before next runs on the way in; code after next runs on the way out.

User prompt
   |
   v
+--------------------+
| Prompt Filter 1    |  --> can rewrite prompt, short-circuit, log
|   Prompt Filter 2  |  --> can add RAG context, redact PII
|     [Inference]    |  --> LLM generates completion
|   Completion Flt 2 |  --> can validate output, cache result
| Completion Flt 1   |  --> can moderate content, log metrics
+--------------------+
   |
   v
Response to user

Tool calls (if any):
+--------------------+
| Tool Inv. Filter 1 |  --> can rate-limit, log, override result
|   Tool Inv. Flt 2  |  --> can cancel execution, cache
|     [Tool Exec]    |  --> actual tool runs
|   Tool Inv. Flt 2  |  --> inspect result, transform
| Tool Inv. Filter 1 |  --> audit trail
+--------------------+

This pattern is sometimes called middleware because it resembles the HTTP middleware pipeline found in ASP.NET Core and similar frameworks.


Why Use Filters?

  1. Separation of Concerns: Keep cross-cutting behavior (logging, moderation, caching) out of your business logic. Each filter does one job.
  2. Composability: Stack filters in any order. Add or remove a filter without touching other code.
  3. Short-Circuiting: A prompt filter can return a cached response without ever calling inference. A tool filter can block a dangerous tool call without executing it.
  4. Observability: Wrap every LLM call with timing, token counting, and cost tracking at a single interception point.
  5. Safety and Compliance: Enforce guardrails as filters: PII redaction before inference, content moderation after inference, and approval gates around tool calls.

The Three Filter Types

IPromptFilter

Intercepts the prompt before it reaches the model and the result after inference completes.

Common uses:

  • Content moderation and prompt injection detection
  • PII redaction (strip sensitive data before sending to the model)
  • RAG context injection (append retrieved documents to the prompt)
  • Semantic caching (return a cached answer for similar prompts)
  • Logging and telemetry

Context properties (PromptFilterContext):

  • Prompt (read/write): the user's prompt text, rewritable by the filter
  • ChatHistory: the full conversation history
  • SystemPrompt: the configured system prompt
  • IsToolResponse: whether this prompt is a tool-result re-submission
  • Properties: shared state bag across all filters in the pipeline
  • Result (write): set to a TextGenerationResult to short-circuit inference entirely

ICompletionFilter

Intercepts the model's completion after inference produces a result.

Common uses:

  • Output quality validation (reject low-confidence responses)
  • Content moderation and safety filtering
  • Response transformation (formatting, translation, redaction)
  • Response caching for future similar prompts
  • Token usage tracking and cost accounting

Context properties (CompletionFilterContext):

  • Prompt: the prompt as submitted (after any rewriting by prompt filters)
  • Result (read/write): the completion, replaceable by the filter
  • ChatHistory: the full conversation history
  • Properties: shared state bag (same instance as the prompt filter context)

IToolInvocationFilter

Intercepts each individual tool call during the automatic tool-calling loop, after the permission policy has been evaluated.

Common uses:

  • Logging and audit trails for every tool invocation
  • Rate limiting (deny if too many calls per session)
  • Result caching (return a cached tool result without re-executing)
  • Error recovery (catch exceptions and provide fallback results)
  • Early termination (stop the tool-calling loop after a condition is met)

Context properties (ToolInvocationFilterContext):

  • ToolCall: the tool call requested by the model
  • Tool: the tool instance that will be invoked
  • PermissionResult: the decision from the permission policy
  • RequestIndex / ToolIndex / ToolCount: position metadata
  • Cancel (write): skip this tool's execution
  • Result (write): override the tool result without executing
  • Terminate (write): stop the tool-calling loop after the current batch

Code Example

Class-Based Filters

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Filters;

// A prompt filter that logs every user prompt
public class LoggingPromptFilter : IPromptFilter
{
    public async Task OnPromptAsync(PromptFilterContext context,
        Func<PromptFilterContext, Task> next)
    {
        Console.WriteLine($"[LOG] Prompt: {context.Prompt[..Math.Min(80, context.Prompt.Length)]}...");
        var sw = System.Diagnostics.Stopwatch.StartNew();

        await next(context); // Call the next filter (or inference)

        sw.Stop();
        Console.WriteLine($"[LOG] Completed in {sw.ElapsedMilliseconds}ms");
    }
}

// A completion filter that blocks unsafe output
public class ModerationCompletionFilter : ICompletionFilter
{
    public async Task OnCompletionAsync(CompletionFilterContext context,
        Func<CompletionFilterContext, Task> next)
    {
        await next(context);

        if (ContainsUnsafeContent(context.Result.TextContent))
        {
            context.Result = new TextGenerationResult("I cannot provide that response.");
        }
    }

    private bool ContainsUnsafeContent(string text) => /* moderation logic */;
}

// Wire up filters on a conversation
var model = LM.LoadFromModelID("gemma3:12b");
using var chat = new MultiTurnConversation(model);

chat.Filters.PromptFilters.Add(new LoggingPromptFilter());
chat.Filters.CompletionFilters.Add(new ModerationCompletionFilter());

Inline Lambda Filters

using LMKit.Model;
using LMKit.Agents;
using LMKit.TextGeneration.Filters;

var model = LM.LoadFromModelID("glm4.7-flash");

// Build an agent with inline filters using extension methods
var agent = Agent.CreateBuilder(model)
    .WithFilters(filters =>
    {
        // Prompt filter: inject a reminder into every prompt
        filters.AddPromptFilter(async (context, next) =>
        {
            context.Prompt = context.Prompt + "\n\nRemember to cite your sources.";
            await next(context);
        });

        // Completion filter: track token usage
        filters.AddCompletionFilter(async (context, next) =>
        {
            await next(context);
            Console.WriteLine($"Tokens used: {context.Result.TokensUsed}");
        });

        // Tool invocation filter: log every tool call
        filters.AddToolInvocationFilter(async (context, next) =>
        {
            Console.WriteLine($"Tool: {context.ToolCall.Name} [{context.ToolIndex + 1}/{context.ToolCount}]");
            await next(context);
            Console.WriteLine($"Result: {context.Result?.Content?[..Math.Min(100, context.Result.Content.Length)]}");
        });
    })
    .Build();

Short-Circuiting with Semantic Caching

using LMKit.TextGeneration.Filters;

public class SemanticCacheFilter : IPromptFilter
{
    private readonly Dictionary<string, TextGenerationResult> _cache = new();

    public async Task OnPromptAsync(PromptFilterContext context,
        Func<PromptFilterContext, Task> next)
    {
        // Check cache before calling inference
        if (_cache.TryGetValue(context.Prompt, out var cached))
        {
            context.Result = cached; // Short-circuit: skip inference entirely
            return;
        }

        await next(context); // No cache hit, run inference

        // Cache the result for future calls
        _cache[context.Prompt] = context.Result;
    }
}

Filter Execution Order

Filters execute in the order they are added to the pipeline. For prompt filters, the first added filter is the outermost layer; for completion filters, the first added filter also wraps outermost. This means:

Stage Execution Order
Prompt (inbound) Filter 1 → Filter 2 → Filter 3 → Inference
Completion (outbound) Filter 3 → Filter 2 → Filter 1 → Response
Tool invocation Filter 1 → Filter 2 → Tool Execution → Filter 2 → Filter 1

The Properties dictionary on each context object is shared across all filters in a single request, enabling cross-filter communication (e.g., a prompt filter records a timer that a completion filter reads).


Key Terms

  • Filter Pipeline: An ordered collection of filters (FilterPipeline) that intercepts the three stages of LLM interaction.
  • Prompt Filter: Intercepts before and after inference, can rewrite prompts or short-circuit with cached results.
  • Completion Filter: Intercepts after inference, can validate, transform, or replace the model's output.
  • Tool Invocation Filter: Intercepts each tool call, can log, cancel, cache, or terminate the tool-calling loop.
  • Short-Circuit: Returning a result from a filter without calling next, skipping all subsequent filters and inference.
  • Onion Model: The nested execution pattern where each filter wraps the next, with inbound and outbound phases.
  • Cross-Filter State: The shared Properties dictionary that allows filters to pass data to each other.



External Resources


Summary

Filters and middleware provide a clean, composable way to intercept every stage of the LLM interaction lifecycle in LM-Kit.NET. The FilterPipeline class organizes three types of filters: IPromptFilter for rewriting or caching prompts before inference, ICompletionFilter for validating or transforming outputs after inference, and IToolInvocationFilter for logging, rate-limiting, or canceling tool calls. Filters follow an onion execution model, can short-circuit the pipeline, and share state across stages. They attach to both MultiTurnConversation and Agent instances, making them the recommended approach for adding cross-cutting concerns like observability, moderation, and compliance to any LM-Kit.NET application.

Share