Table of Contents

How Do I Add Content Moderation and Safety Filters to Conversations?


TL;DR

LM-Kit.NET provides a middleware filter pipeline with three interception points: prompt filters (inspect/modify input before inference), completion filters (validate/transform output after inference), and tool invocation filters (control tool execution). Filters follow an onion pattern: code before await next() runs pre-processing, code after runs post-processing. Use filters for content moderation, prompt injection detection, output sanitization, quality gates, and custom guardrails.


The Three Filter Types

Prompt Filters (Input Safety)

Intercept user input before it reaches the model:

using LMKit.TextGeneration.Filters;

chat.Filters.AddPromptFilter(async (context, next) =>
{
    string input = context.Prompt;

    // Block prohibited content
    if (ContainsProhibitedContent(input))
    {
        context.Result = "I'm unable to process that request.";
        return;  // Skip inference entirely
    }

    // Sanitize input
    context.Prompt = SanitizeInput(input);

    await next(context);  // Continue to model
});

Use cases:

  • Prompt injection detection and blocking
  • Input sanitization (remove scripts, special characters)
  • Content policy enforcement
  • Rate limiting per user
  • Input logging for audit

Completion Filters (Output Safety)

Validate and transform model output after generation:

chat.Filters.AddCompletionFilter(async (context, next) =>
{
    await next(context);  // Wait for model to generate

    string output = context.Result.Text;

    // Quality gate: reject very short or empty responses
    if (string.IsNullOrWhiteSpace(output) || output.Length < 20)
    {
        context.Result = CreateFallbackResponse("Could you rephrase your question?");
        return;
    }

    // Sanitize output
    if (ContainsSensitiveData(output))
    {
        context.Result = RedactSensitiveData(context.Result);
    }
});

Use cases:

  • Output content moderation
  • PII redaction from responses
  • Quality gates (minimum length, coherence checks)
  • Response caching
  • Telemetry and token usage tracking
  • Custom guardrails (policy-violating content rejection)

Tool Invocation Filters (Action Safety)

Control which tools execute and how:

chat.Filters.AddToolInvocationFilter(async (context, next) =>
{
    // Rate limit external API calls
    if (context.ToolName == "http_get" && IsRateLimited())
    {
        context.Result = "{\"error\": \"Rate limit exceeded. Try again later.\"}";
        return;  // Skip actual execution
    }

    // Log all tool calls for audit
    AuditLog.Record(context.ToolName, context.Arguments);

    await next(context);  // Execute the tool

    // Validate tool results
    if (context.ToolName.StartsWith("filesystem_"))
    {
        ValidateFileAccess(context.Result);
    }
});

Use cases:

  • Rate limiting external API calls
  • Audit logging of all tool invocations
  • Result caching to avoid repeated calls
  • Error recovery with fallback results
  • Early termination of tool-calling loops

Filter Pipeline Order

Filters execute in registration order, wrapping each other like an onion:

User Input
  → Prompt Filter 1 (pre)
    → Prompt Filter 2 (pre)
      → Model Inference
    ← Prompt Filter 2 (post)
  ← Prompt Filter 1 (post)
Response

Register the most important filters first (outermost layer).


Class-Based Filters

For reusable filters, implement the interface directly:

public class ContentModerationFilter : IPromptFilter
{
    private readonly ContentPolicy _policy;

    public ContentModerationFilter(ContentPolicy policy) => _policy = policy;

    public async Task OnPromptAsync(PromptFilterContext context, Func<PromptFilterContext, Task> next)
    {
        var violations = _policy.Check(context.Prompt);

        if (violations.Any())
        {
            context.Result = $"Content policy violation: {violations.First().Reason}";
            return;
        }

        await next(context);
    }
}

// Register
chat.Filters.Add(new ContentModerationFilter(myPolicy));

Combining Filters with Permission Policies

Filters work alongside ToolPermissionPolicy for layered security:

  1. ToolPermissionPolicy evaluates first: allow, deny, or require approval.
  2. Tool Invocation Filters run second: logging, rate limiting, validation.
  3. Tool executes if both layers pass.

This provides defense in depth: policies handle broad access control, filters handle specific runtime behavior.


Share