Table of Contents

Build a Content Moderation Filter

User-generated content (comments, reviews, chat messages, forum posts) needs moderation before it reaches other users. LM-Kit.NET provides text analysis APIs that classify content by harm type, detect toxic sentiment, and flag emotional escalation. Everything runs locally, so user content never leaves your infrastructure. This tutorial builds a working moderation system that combines multiple classifiers for accurate, multi-signal filtering.


Why Local Content Moderation Matters

Two enterprise problems that on-device content moderation solves:

  1. User data stays on-premises. Moderation involves reading every piece of user content. Sending chat messages, support tickets, and community posts to a cloud moderation API means a third party processes all your user communications. Local moderation keeps that data entirely within your infrastructure, simplifying GDPR, COPPA, and platform liability.
  2. Real-time moderation without per-call costs. Cloud moderation APIs charge per request. A community platform processing 500K messages per day accumulates significant costs. A local model handles unlimited volume at fixed hardware cost, making real-time pre-publish filtering economically viable.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 4+ GB
Disk ~3 GB free for model download

Step 1: Create the Project

dotnet new console -n ModerationQuickstart
cd ModerationQuickstart
dotnet add package LM-Kit.NET

Step 2: Understand the Approach

LM-Kit.NET does not have a single "moderation" class. Instead, you combine multiple text analysis tools, each contributing a different signal:

  User input
      │
      ├──► Categorization ──► harm type (toxicity, harassment, spam, safe)
      │
      ├──► SentimentAnalysis ──► positive / negative / neutral
      │
      ├──► EmotionDetection ──► anger / fear / sadness / happiness / neutral
      │
      └──► SarcasmDetection ──► sarcastic? (masks hostility as humor)
      │
      ▼
  Moderation decision (allow / flag / block)

This multi-signal approach catches more violations than any single classifier. A message might pass a toxicity check but get flagged by the combination of negative sentiment and anger emotion.


Step 3: Basic Harm Category Classification

This program classifies user content into harm categories using Categorization:

using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Define harm categories
// ──────────────────────────────────────
string[] categories =
{
    "safe",
    "toxicity",
    "harassment",
    "hate_speech",
    "spam",
    "self_harm"
};

string[] descriptions =
{
    "Normal, constructive, or neutral content that does not violate any policy",
    "Rude, disrespectful, or aggressive language intended to provoke or offend",
    "Content targeting a specific individual with threats, intimidation, or repeated unwanted contact",
    "Content attacking people based on race, ethnicity, religion, gender, sexual orientation, or disability",
    "Unsolicited commercial content, scams, or repetitive irrelevant messages",
    "Content that encourages, glorifies, or provides instructions for self-harm or suicide"
};

var categorizer = new Categorization(model)
{
    AllowUnknownCategory = false
};

// ──────────────────────────────────────
// 3. Test with sample content
// ──────────────────────────────────────
string[] samples =
{
    "Great tutorial, thanks for sharing this with the community!",
    "You're an absolute moron and everyone knows it.",
    "Buy cheap watches at www.totally-legit-deals.biz! Limited offer!!!",
    "I think the second approach works better for large datasets.",
    "People from that country are all the same. They should go back.",
    "I keep messaging you because you need to answer me RIGHT NOW."
};

Console.WriteLine("Classifying content:\n");

foreach (string text in samples)
{
    int index = categorizer.GetBestCategory(categories, descriptions, text);
    string label = categories[index];
    float confidence = categorizer.Confidence;

    ConsoleColor color = label == "safe" ? ConsoleColor.Green : ConsoleColor.Red;
    Console.ForegroundColor = color;
    Console.Write($"  [{label,-12}]");
    Console.ResetColor();
    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.Write($" ({confidence:P0}) ");
    Console.ResetColor();

    string preview = text.Length > 55 ? text.Substring(0, 55) + "..." : text;
    Console.WriteLine(preview);
}

Step 4: Multi-Signal Moderation Pipeline

Combine all four classifiers for higher accuracy. A single classifier can miss edge cases. Multiple signals together catch more:

using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create all classifiers
// ──────────────────────────────────────
string[] harmCategories = { "safe", "toxicity", "harassment", "hate_speech", "spam" };
string[] harmDescriptions =
{
    "Normal, constructive, or neutral content",
    "Rude, disrespectful, or aggressive language",
    "Threats, intimidation, or targeting a specific individual",
    "Attacks based on race, religion, gender, or identity",
    "Unsolicited commercial content or scams"
};

var categorizer = new Categorization(model) { AllowUnknownCategory = false };
var sentiment = new SentimentAnalysis(model) { NeutralSupport = true };
var emotion = new EmotionDetection(model) { NeutralSupport = true };
var sarcasm = new SarcasmDetection(model);

// ──────────────────────────────────────
// 3. Define the moderation function
// ──────────────────────────────────────
(string Decision, string Reason) Moderate(string text)
{
    // Signal 1: Harm category
    int catIndex = categorizer.GetBestCategory(harmCategories, harmDescriptions, text);
    string harmType = harmCategories[catIndex];
    float harmConfidence = categorizer.Confidence;

    // Signal 2: Sentiment
    SentimentCategory sent = sentiment.GetSentimentCategory(text);
    float sentConfidence = sentiment.Confidence;

    // Signal 3: Emotion
    EmotionCategory emo = emotion.GetEmotionCategory(text);
    float emoConfidence = emotion.Confidence;

    // Signal 4: Sarcasm
    bool isSarcastic = sarcasm.IsSarcastic(text);
    float sarcConfidence = sarcasm.Confidence;

    // Decision logic
    // Block: explicit harm category with high confidence
    if (harmType != "safe" && harmConfidence > 0.80f)
        return ("BLOCK", $"{harmType} ({harmConfidence:P0})");

    // Block: negative sentiment + anger + sarcasm (veiled hostility)
    if (sent == SentimentCategory.Negative && emo == EmotionCategory.Anger && isSarcastic)
        return ("BLOCK", "veiled hostility (negative + anger + sarcasm)");

    // Flag: harm category detected but lower confidence
    if (harmType != "safe" && harmConfidence > 0.50f)
        return ("FLAG", $"possible {harmType} ({harmConfidence:P0})");

    // Flag: strong anger without explicit harm category
    if (emo == EmotionCategory.Anger && emoConfidence > 0.85f && sent == SentimentCategory.Negative)
        return ("FLAG", $"anger ({emoConfidence:P0}) + negative sentiment");

    return ("ALLOW", "no issues detected");
}

// ──────────────────────────────────────
// 4. Interactive moderation loop
// ──────────────────────────────────────
Console.WriteLine("Enter text to moderate (or 'quit' to exit):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("Text: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    var (decision, reason) = Moderate(input);

    Console.ForegroundColor = decision switch
    {
        "BLOCK" => ConsoleColor.Red,
        "FLAG" => ConsoleColor.Yellow,
        _ => ConsoleColor.Green
    };
    Console.Write($"  {decision}");
    Console.ResetColor();
    Console.WriteLine($"  {reason}\n");
}

Step 5: Batch Moderation for Existing Content

Moderate a backlog of user content and export results:

string[] messages = File.ReadAllLines("user_comments.txt");
var results = new List<string>();
results.Add("text,decision,reason");

int blocked = 0, flagged = 0, allowed = 0;

foreach (string message in messages)
{
    if (string.IsNullOrWhiteSpace(message)) continue;

    var (decision, reason) = Moderate(message);

    results.Add($"\"{message.Replace("\"", "\"\"")}\",{decision},\"{reason}\"");

    switch (decision)
    {
        case "BLOCK": blocked++; break;
        case "FLAG": flagged++; break;
        default: allowed++; break;
    }
}

File.WriteAllLines("moderation_results.csv", results);

int total = blocked + flagged + allowed;
Console.WriteLine($"Processed {total} messages:");
Console.WriteLine($"  Blocked: {blocked} ({(double)blocked / total:P0})");
Console.WriteLine($"  Flagged: {flagged} ({(double)flagged / total:P0})");
Console.WriteLine($"  Allowed: {allowed} ({(double)allowed / total:P0})");

Step 6: Multi-Label Violation Detection

Some content violates multiple policies at once. Use GetTopCategories to detect all applicable violations:

string[] policies =
{
    "profanity", "sexual_content", "violence", "harassment",
    "hate_speech", "spam", "misinformation", "safe"
};

string[] policyDescriptions =
{
    "Contains vulgar or obscene language",
    "Contains sexually explicit or suggestive content",
    "Describes or promotes physical violence",
    "Targets an individual with hostility or threats",
    "Attacks a group based on protected characteristics",
    "Commercial solicitation or repetitive promotional content",
    "False or misleading claims presented as fact",
    "Normal content that does not violate any policy"
};

string text = "You people are disgusting and should all be hurt. Buy my product!";

List<int> violations = categorizer.GetTopCategories(
    policies, policyDescriptions, text, maxCategories: 3);

Console.WriteLine("Detected violations:");
foreach (int idx in violations)
{
    if (policies[idx] != "safe")
    {
        Console.WriteLine($"  {policies[idx]}");
    }
}

Step 7: Tuning Thresholds for Your Platform

Different platforms need different sensitivity levels. A children's app needs strict filtering. A professional debate forum needs more tolerance for strong opinions.

// Strict: children's platform or customer-facing chat
const float StrictHarmThreshold = 0.50f;
const float StrictAngerThreshold = 0.70f;

// Moderate: general community forum
const float ModerateHarmThreshold = 0.75f;
const float ModerateAngerThreshold = 0.85f;

// Permissive: adult discussion, debate platforms
const float PermissiveHarmThreshold = 0.90f;
const float PermissiveAngerThreshold = 0.95f;

// Select the profile for your platform
float harmThreshold = ModerateHarmThreshold;
float angerThreshold = ModerateAngerThreshold;

int catIndex = categorizer.GetBestCategory(harmCategories, harmDescriptions, text);
string harmType = harmCategories[catIndex];

if (harmType != "safe" && categorizer.Confidence > harmThreshold)
{
    Console.WriteLine("Content blocked.");
}

Measure your false positive rate by running a batch of known-safe content through the pipeline. If more than 2% of safe content gets blocked, lower your thresholds. If harmful content slips through, raise them.


Model Selection for Content Moderation

Model ID VRAM Speed Accuracy Best For
gemma3:1b ~1.5 GB Fastest Good High-volume pre-screening
gemma3:4b ~3.5 GB Fast Very good General moderation (recommended)
qwen3:4b ~3.5 GB Fast Very good Multilingual communities
gemma3:12b ~8 GB Moderate Excellent Nuanced content (sarcasm, context)

For most moderation tasks, gemma3:4b balances speed and accuracy well. Use a 1B model as a fast pre-filter and escalate uncertain cases to a larger model.


Common Issues

Problem Cause Fix
Too many false positives Thresholds too strict Raise harmThreshold; test with known-safe content
Sarcastic insults pass through Sarcasm detector not included Add SarcasmDetection as a signal; combine with sentiment
Slow on high-volume streams Running all 4 classifiers per message Use category check first; only run full pipeline if harmType != "safe"
Wrong category for short messages Insufficient context Add categorizer.Guidance with platform context; use larger model
Spam not detected Spam patterns differ by platform Add platform-specific spam descriptions; include URL patterns in guidance

Next Steps