Add Middleware Filters to Agents and Conversations
LM-Kit.NET provides a FilterPipeline that lets you attach middleware-style filters to intercept three stages of text generation: prompts (before inference), completions (after inference), and tool invocations (during the tool-calling loop). Filters follow the ASP.NET Core onion pattern: code before await next(context) runs on the way in, code after runs on the way out. This guide walks through each filter type with practical examples for logging, prompt rewriting, caching, rate limiting, content moderation, and telemetry.
Why Middleware Filters
Two enterprise scenarios where filters provide cleaner solutions than ad-hoc event handlers:
- Composable cross-cutting concerns. Logging, caching, rate limiting, and moderation are separate responsibilities that should be independently developed, tested, and stacked. Filters let you compose them as a pipeline without modifying application logic or tool implementations.
- Prompt-level control. Unlike events, prompt filters can rewrite the input before inference or short-circuit entirely (e.g., returning a cached response). This enables patterns like semantic caching and input sanitization that are not possible with
BeforeToolInvocationevents alone.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB |
| Disk | ~3 GB free for model download |
Step 1: Create the Project
dotnet new console -n FilterPipelineQuickstart
cd FilterPipelineQuickstart
dotnet add package LM-Kit.NET
Step 2: Add a Prompt Filter for Logging
A prompt filter intercepts every submission before it reaches the model. Use it for logging, PII redaction, prompt augmentation, or caching.
using System.Diagnostics;
using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Create conversation with prompt filter
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
SystemPrompt = "You are a helpful assistant.",
MaximumCompletionTokens = 512
};
var stopwatch = new Stopwatch();
chat.Filters = new FilterPipeline()
.AddPromptFilter(async (ctx, next) =>
{
// Before inference
Console.WriteLine($"[LOG] Prompt: \"{ctx.Prompt}\"");
Console.WriteLine($"[LOG] System prompt: \"{ctx.SystemPrompt}\"");
Console.WriteLine($"[LOG] Is tool response: {ctx.IsToolResponse}");
stopwatch.Restart();
await next(ctx); // run inference
stopwatch.Stop();
// After inference
Console.WriteLine($"[LOG] Inference completed in {stopwatch.ElapsedMilliseconds}ms");
});
chat.AfterTextCompletion += (_, e) =>
{
if (e.SegmentType == TextSegmentType.UserVisible)
Console.Write(e.Text);
};
// ──────────────────────────────────────
// 3. Run a prompt
// ──────────────────────────────────────
Console.Write("Assistant: ");
chat.Submit("What is the middleware pattern?");
Console.WriteLine("\n");
The filter wraps the entire inference call. Code before next() runs first, then inference, then code after next().
Step 3: Rewrite Prompts Before Inference
Modify ctx.Prompt to change what the model sees without changing the caller's message:
chat.Filters = new FilterPipeline()
.AddPromptFilter(async (ctx, next) =>
{
// Append a constraint the model should follow
ctx.Prompt = ctx.Prompt + "\n\nRespond in three sentences or fewer.";
await next(ctx);
});
Multiple prompt filters execute in onion order. If Filter A is added before Filter B:
Filter A (before) → Filter B (before) → Inference → Filter B (after) → Filter A (after)
Step 4: Short-Circuit with a Cached Response
If a prompt filter sets ctx.Result before calling next(), inference is skipped entirely. This is the foundation for semantic caching:
using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
var chat = new MultiTurnConversation(model)
{
SystemPrompt = "You are a helpful assistant.",
MaximumCompletionTokens = 512
};
// Simple exact-match response cache
var responseCache = new Dictionary<string, TextGenerationResult>();
chat.Filters = new FilterPipeline()
.AddPromptFilter(async (ctx, next) =>
{
if (responseCache.TryGetValue(ctx.Prompt, out var cached))
{
Console.WriteLine("[CACHE HIT] Returning cached response.");
ctx.Result = cached; // short-circuit: inference is skipped
return;
}
await next(ctx);
// Store result for future lookups
if (ctx.Result != null)
responseCache[ctx.Prompt] = ctx.Result;
});
chat.AfterTextCompletion += (_, e) =>
{
if (e.SegmentType == TextSegmentType.UserVisible)
Console.Write(e.Text);
};
Console.Write("First call: ");
chat.Submit("What is 2+2?");
Console.Write("\n\nSecond call (same prompt): ");
chat.Submit("What is 2+2?");
Console.WriteLine("\n");
Step 5: Add a Completion Filter for Telemetry
Completion filters run after inference produces a result. Use them for telemetry, quality gates, output transformation, or response caching:
chat.Filters = new FilterPipeline()
.AddCompletionFilter(async (ctx, next) =>
{
await next(ctx);
// ctx.Result is now populated
if (ctx.Result != null)
{
Console.WriteLine($"[TELEMETRY] Tokens: {ctx.Result.GeneratedTokens.Count}");
Console.WriteLine($"[TELEMETRY] Speed: {ctx.Result.TokenGenerationRate:F1} tok/s");
Console.WriteLine($"[TELEMETRY] Quality: {ctx.Result.QualityScore:F2}");
}
});
Step 6: Share State Between Prompt and Completion Filters
The Properties dictionary is shared by reference between all filter contexts within a single request. Use it to pass data from prompt filters to completion filters:
chat.Filters = new FilterPipeline()
.AddPromptFilter(async (ctx, next) =>
{
ctx.Properties["requestId"] = Guid.NewGuid().ToString("N")[..8];
ctx.Properties["startTimestamp"] = Stopwatch.GetTimestamp();
Console.WriteLine($"[REQ {ctx.Properties["requestId"]}] Started");
await next(ctx);
})
.AddCompletionFilter(async (ctx, next) =>
{
await next(ctx);
var start = (long)ctx.Properties["startTimestamp"];
double ms = (double)(Stopwatch.GetTimestamp() - start) / Stopwatch.Frequency * 1000;
Console.WriteLine($"[REQ {ctx.Properties["requestId"]}] Completed in {ms:F0}ms");
});
Step 7: Add Tool Invocation Filters
Tool invocation filters wrap each individual tool call during the automatic tool-calling loop. They execute after the permission policy evaluation but before the actual tool invocation.
using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Chat;
using LMKit.TextGeneration.Filters;
using LMKit.Agents.Tools;
using LMKit.Agents.Tools.BuiltIn;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
var chat = new MultiTurnConversation(model)
{
SystemPrompt = "You are a helpful assistant with tools.",
MaximumCompletionTokens = 512
};
chat.Tools.Register(BuiltInTools.CalcArithmetic);
chat.Tools.Register(BuiltInTools.DateTimeNow);
chat.Filters = new FilterPipeline()
.AddToolInvocationFilter(async (ctx, next) =>
{
Console.WriteLine($"\n [TOOL] {ctx.ToolCall.Name}({ctx.ToolCall.ArgumentsJson})");
Console.WriteLine($" [TOOL] Batch: {ctx.ToolIndex + 1}/{ctx.ToolCount}, Request cycle: {ctx.RequestIndex}");
await next(ctx);
Console.WriteLine($" [TOOL] Result: {ctx.Result?.ResultJson}");
});
chat.AfterTextCompletion += (_, e) =>
{
if (e.SegmentType == TextSegmentType.UserVisible)
Console.Write(e.Text);
};
Console.Write("Assistant: ");
chat.Submit("What is 256 * 789? And what day is it today?");
Console.WriteLine("\n");
Tool Filter Context Properties
| Property | Type | Description |
|---|---|---|
ToolCall |
ToolCall |
The requested tool call (name and arguments) |
Tool |
ITool |
The tool instance that will be invoked |
PermissionResult |
ToolPermissionResult |
Result from permission policy evaluation |
RequestIndex |
int |
Zero-based LLM request cycle index |
ToolIndex |
int |
Position within the current batch of tool calls |
ToolCount |
int |
Total tools in the current batch |
Cancel |
bool |
Set to true to skip this tool's execution |
Result |
ToolCallResult? |
Set to override the tool result (short-circuits execution) |
Terminate |
bool |
Set to true to stop the model from requesting further tool calls |
Step 8: Rate-Limit Tool Calls
Prevent runaway agents from making too many tool calls by counting invocations:
int toolCallCount = 0;
const int MaxToolCalls = 10;
pipeline.AddToolInvocationFilter(async (ctx, next) =>
{
toolCallCount++;
if (toolCallCount > MaxToolCalls)
{
Console.WriteLine($"[RATE LIMIT] Blocked call #{toolCallCount}");
ctx.Cancel = true;
return;
}
await next(ctx);
});
Step 9: Cache Tool Results
Avoid redundant tool invocations by caching results keyed on tool name and arguments:
var toolCache = new Dictionary<string, ToolCallResult>();
pipeline.AddToolInvocationFilter(async (ctx, next) =>
{
string key = $"{ctx.ToolCall.Name}:{ctx.ToolCall.ArgumentsJson}";
if (toolCache.TryGetValue(key, out var cached))
{
ctx.Result = cached; // skip actual invocation
return;
}
await next(ctx);
if (ctx.Result != null)
toolCache[key] = ctx.Result;
});
Step 10: Use Filters with Agents via AgentBuilder
The AgentBuilder.WithFilters() method accepts a FilterPipeline instance or an inline configuration callback:
using LMKit.Agents;
using LMKit.Agents.Tools.BuiltIn;
using LMKit.TextGeneration.Filters;
// Option A: pass a pre-built pipeline
var pipeline = new FilterPipeline()
.AddPromptFilter(async (ctx, next) => { /* ... */ await next(ctx); })
.AddToolInvocationFilter(async (ctx, next) => { /* ... */ await next(ctx); });
var agent = Agent.CreateBuilder(model)
.WithInstruction("You are a helpful assistant.")
.WithTools(tools => tools.Register(BuiltInTools.CalcArithmetic))
.WithFilters(pipeline)
.Build();
// Option B: configure inline
var agent2 = Agent.CreateBuilder(model)
.WithInstruction("You are a helpful assistant.")
.WithTools(tools => tools.Register(BuiltInTools.CalcArithmetic))
.WithFilters(filters =>
{
filters.AddPromptFilter(async (ctx, next) =>
{
Console.WriteLine($"[Agent] Prompt: {ctx.Prompt}");
await next(ctx);
});
})
.Build();
Step 11: Implement Class-Based Filters
For reusable, testable filters, implement the filter interfaces directly:
using LMKit.TextGeneration.Filters;
public class AuditPromptFilter : IPromptFilter
{
private readonly ILogger _logger;
public AuditPromptFilter(ILogger logger)
{
_logger = logger;
}
public async Task OnPromptAsync(
PromptFilterContext context,
Func<PromptFilterContext, Task> next)
{
_logger.LogInformation("Prompt received: {Prompt}", context.Prompt);
await next(context);
_logger.LogInformation("Inference completed for prompt");
}
}
// Register with pipeline
var pipeline = new FilterPipeline();
pipeline.PromptFilters.Add(new AuditPromptFilter(logger));
The three filter interfaces:
| Interface | Method | When It Runs |
|---|---|---|
IPromptFilter |
OnPromptAsync(PromptFilterContext, next) |
Before and after inference |
ICompletionFilter |
OnCompletionAsync(CompletionFilterContext, next) |
After inference produces a result |
IToolInvocationFilter |
OnToolInvocationAsync(ToolInvocationFilterContext, next) |
Around each individual tool call |
Filters vs. Events
Filters and events are complementary. Filters execute first; events fire afterward. Use the approach that best fits your scenario:
| Capability | Filters | Events |
|---|---|---|
| Modify prompt before inference | Yes (ctx.Prompt = ...) |
No |
| Short-circuit inference | Yes (ctx.Result = ...) |
No |
| Compose multiple middleware layers | Yes (onion pattern) | Limited (multiple handlers) |
| Share state across stages | Yes (Properties dictionary) |
No built-in mechanism |
| Cancel a tool call | Yes (ctx.Cancel = true) |
Yes (e.Cancel = true) |
| Override tool result | Yes (ctx.Result = ...) |
No |
| Terminate tool-calling loop | Yes (ctx.Terminate = true) |
No |
| Simple one-off logging | Either works | Simpler to set up |
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Filters not executing | Pipeline not assigned | Set chat.Filters = pipeline or use AgentBuilder.WithFilters() |
| Inference skipped unexpectedly | A prompt filter sets ctx.Result without calling next() |
Check all prompt filters for unintended short-circuits |
| Tool filter does not see all calls | Filter added after Submit() |
Add filters before submitting prompts |
| Properties empty in completion filter | Different Dictionary instances |
Ensure the same Properties dictionary flows through both contexts (this is automatic when using MultiTurnConversation.Filters) |
| Filters fire but events do not | Not a bug | Filters and events are independent; attach event handlers separately if needed |
Next Steps
- Intercept and Control Tool Invocations: event-based tool interception (complementary to filters)
- Secure Agent Tool Access with Permission Policies: combine with
ToolPermissionPolicyfor defense-in-depth - Create an AI Agent with Tools: build agents with the
IToolinterface - Build a Resilient Production Agent: error handling, retries, and observability for production agents