How Do I Add Content Moderation and Safety Filters to Conversations?
TL;DR
LM-Kit.NET provides a middleware filter pipeline with three interception points: prompt filters (inspect/modify input before inference), completion filters (validate/transform output after inference), and tool invocation filters (control tool execution). Filters follow an onion pattern: code before await next() runs pre-processing, code after runs post-processing. Use filters for content moderation, prompt injection detection, output sanitization, quality gates, and custom guardrails.
The Three Filter Types
Prompt Filters (Input Safety)
Intercept user input before it reaches the model:
using LMKit.TextGeneration.Filters;
chat.Filters.AddPromptFilter(async (context, next) =>
{
string input = context.Prompt;
// Block prohibited content
if (ContainsProhibitedContent(input))
{
context.Result = "I'm unable to process that request.";
return; // Skip inference entirely
}
// Sanitize input
context.Prompt = SanitizeInput(input);
await next(context); // Continue to model
});
Use cases:
- Prompt injection detection and blocking
- Input sanitization (remove scripts, special characters)
- Content policy enforcement
- Rate limiting per user
- Input logging for audit
Completion Filters (Output Safety)
Validate and transform model output after generation:
chat.Filters.AddCompletionFilter(async (context, next) =>
{
await next(context); // Wait for model to generate
string output = context.Result.Text;
// Quality gate: reject very short or empty responses
if (string.IsNullOrWhiteSpace(output) || output.Length < 20)
{
context.Result = CreateFallbackResponse("Could you rephrase your question?");
return;
}
// Sanitize output
if (ContainsSensitiveData(output))
{
context.Result = RedactSensitiveData(context.Result);
}
});
Use cases:
- Output content moderation
- PII redaction from responses
- Quality gates (minimum length, coherence checks)
- Response caching
- Telemetry and token usage tracking
- Custom guardrails (policy-violating content rejection)
Tool Invocation Filters (Action Safety)
Control which tools execute and how:
chat.Filters.AddToolInvocationFilter(async (context, next) =>
{
// Rate limit external API calls
if (context.ToolName == "http_get" && IsRateLimited())
{
context.Result = "{\"error\": \"Rate limit exceeded. Try again later.\"}";
return; // Skip actual execution
}
// Log all tool calls for audit
AuditLog.Record(context.ToolName, context.Arguments);
await next(context); // Execute the tool
// Validate tool results
if (context.ToolName.StartsWith("filesystem_"))
{
ValidateFileAccess(context.Result);
}
});
Use cases:
- Rate limiting external API calls
- Audit logging of all tool invocations
- Result caching to avoid repeated calls
- Error recovery with fallback results
- Early termination of tool-calling loops
Filter Pipeline Order
Filters execute in registration order, wrapping each other like an onion:
User Input
→ Prompt Filter 1 (pre)
→ Prompt Filter 2 (pre)
→ Model Inference
← Prompt Filter 2 (post)
← Prompt Filter 1 (post)
Response
Register the most important filters first (outermost layer).
Class-Based Filters
For reusable filters, implement the interface directly:
public class ContentModerationFilter : IPromptFilter
{
private readonly ContentPolicy _policy;
public ContentModerationFilter(ContentPolicy policy) => _policy = policy;
public async Task OnPromptAsync(PromptFilterContext context, Func<PromptFilterContext, Task> next)
{
var violations = _policy.Check(context.Prompt);
if (violations.Any())
{
context.Result = $"Content policy violation: {violations.First().Reason}";
return;
}
await next(context);
}
}
// Register
chat.Filters.Add(new ContentModerationFilter(myPolicy));
Combining Filters with Permission Policies
Filters work alongside ToolPermissionPolicy for layered security:
- ToolPermissionPolicy evaluates first: allow, deny, or require approval.
- Tool Invocation Filters run second: logging, rate limiting, validation.
- Tool executes if both layers pass.
This provides defense in depth: policies handle broad access control, filters handle specific runtime behavior.
📚 Related Content
- How do I prevent an agent from misusing tools?: Permission policies for tool access control.
- How do I monitor and debug AI agent execution?: Observability and logging patterns.
- How can I reduce hallucinations in local AI models?: Output quality improvement techniques.
- Add Middleware Filters to Agents and Conversations: Step-by-step filter implementation guide.