Build a Content Moderation Filter

User-generated content (comments, reviews, chat messages, forum posts) needs moderation before it reaches other users. LM-Kit.NET provides text analysis APIs that classify content by harm type, detect toxic sentiment, and flag emotional escalation. Everything runs locally, so user content never leaves your infrastructure. This tutorial builds a working moderation system that combines multiple classifiers for accurate, multi-signal filtering.

Why Local Content Moderation Matters

Two enterprise problems that on-device content moderation solves:

User data stays on-premises. Moderation involves reading every piece of user content. Sending chat messages, support tickets, and community posts to a cloud moderation API means a third party processes all your user communications. Local moderation keeps that data entirely within your infrastructure, simplifying GDPR, COPPA, and platform liability.
Real-time moderation without per-call costs. Cloud moderation APIs charge per request. A community platform processing 500K messages per day accumulates significant costs. A local model handles unlimited volume at fixed hardware cost, making real-time pre-publish filtering economically viable.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	4+ GB
Disk	~3 GB free for model download

Step 1: Create the Project

dotnet new console -n ModerationQuickstart
cd ModerationQuickstart
dotnet add package LM-Kit.NET

Step 2: Understand the Approach

LM-Kit.NET does not have a single "moderation" class. Instead, you combine multiple text analysis tools, each contributing a different signal:

  User input
      │
      ├──► Categorization ──► harm type (toxicity, harassment, spam, safe)
      │
      ├──► SentimentAnalysis ──► positive / negative / neutral
      │
      ├──► EmotionDetection ──► anger / fear / sadness / happiness / neutral
      │
      └──► SarcasmDetection ──► sarcastic? (masks hostility as humor)
      │
      ▼
  Moderation decision (allow / flag / block)

This multi-signal approach catches more violations than any single classifier. A message might pass a toxicity check but get flagged by the combination of negative sentiment and anger emotion.

Step 3: Basic Harm Category Classification

This program classifies user content into harm categories using Categorization:

using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Define harm categories
// ──────────────────────────────────────
string[] categories =
{
    "safe",
    "toxicity",
    "harassment",
    "hate_speech",
    "spam",
    "self_harm"
};

string[] descriptions =
{
    "Normal, constructive, or neutral content that does not violate any policy",
    "Rude, disrespectful, or aggressive language intended to provoke or offend",
    "Content targeting a specific individual with threats, intimidation, or repeated unwanted contact",
    "Content attacking people based on race, ethnicity, religion, gender, sexual orientation, or disability",
    "Unsolicited commercial content, scams, or repetitive irrelevant messages",
    "Content that encourages, glorifies, or provides instructions for self-harm or suicide"
};

var categorizer = new Categorization(model)
{
    AllowUnknownCategory = false
};

// ──────────────────────────────────────
// 3. Test with sample content
// ──────────────────────────────────────
string[] samples =
{
    "Great tutorial, thanks for sharing this with the community!",
    "You're an absolute moron and everyone knows it.",
    "Buy cheap watches at www.totally-legit-deals.biz! Limited offer!!!",
    "I think the second approach works better for large datasets.",
    "People from that country are all the same. They should go back.",
    "I keep messaging you because you need to answer me RIGHT NOW."
};

Console.WriteLine("Classifying content:\n");

foreach (string text in samples)
{
    int index = categorizer.GetBestCategory(categories, descriptions, text);
    string label = categories[index];
    float confidence = categorizer.Confidence;

    ConsoleColor color = label == "safe" ? ConsoleColor.Green : ConsoleColor.Red;
    Console.ForegroundColor = color;
    Console.Write($"  [{label,-12}]");
    Console.ResetColor();
    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.Write($" ({confidence:P0}) ");
    Console.ResetColor();

    string preview = text.Length > 55 ? text.Substring(0, 55) + "..." : text;
    Console.WriteLine(preview);
}

Step 4: Multi-Signal Moderation Pipeline

Combine all four classifiers for higher accuracy. A single classifier can miss edge cases. Multiple signals together catch more:

using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Create all classifiers
// ──────────────────────────────────────
string[] harmCategories = { "safe", "toxicity", "harassment", "hate_speech", "spam" };
string[] harmDescriptions =
{
    "Normal, constructive, or neutral content",
    "Rude, disrespectful, or aggressive language",
    "Threats, intimidation, or targeting a specific individual",
    "Attacks based on race, religion, gender, or identity",
    "Unsolicited commercial content or scams"
};

var categorizer = new Categorization(model) { AllowUnknownCategory = false };
var sentiment = new SentimentAnalysis(model) { NeutralSupport = true };
var emotion = new EmotionDetection(model) { NeutralSupport = true };
var sarcasm = new SarcasmDetection(model);

// ──────────────────────────────────────
// 3. Define the moderation function
// ──────────────────────────────────────
(string Decision, string Reason) Moderate(string text)
{
    // Signal 1: Harm category
    int catIndex = categorizer.GetBestCategory(harmCategories, harmDescriptions, text);
    string harmType = harmCategories[catIndex];
    float harmConfidence = categorizer.Confidence;

    // Signal 2: Sentiment
    SentimentAnalysis.SentimentCategory sent = sentiment.GetSentimentCategory(text);
    float sentConfidence = sentiment.Confidence;

    // Signal 3: Emotion
    EmotionDetection.EmotionCategory emo = emotion.GetEmotionCategory(text);
    float emoConfidence = emotion.Confidence;

    // Signal 4: Sarcasm
    bool isSarcastic = sarcasm.IsSarcastic(text);
    float sarcConfidence = sarcasm.Confidence;

    // Decision logic
    // Block: explicit harm category with high confidence
    if (harmType != "safe" && harmConfidence > 0.80f)
        return ("BLOCK", $"{harmType} ({harmConfidence:P0})");

    // Block: negative sentiment + anger + sarcasm (veiled hostility)
    if (sent == SentimentAnalysis.SentimentCategory.Negative && emo == EmotionDetection.EmotionCategory.Anger && isSarcastic)
        return ("BLOCK", "veiled hostility (negative + anger + sarcasm)");

    // Flag: harm category detected but lower confidence
    if (harmType != "safe" && harmConfidence > 0.50f)
        return ("FLAG", $"possible {harmType} ({harmConfidence:P0})");

    // Flag: strong anger without explicit harm category
    if (emo == EmotionDetection.EmotionCategory.Anger && emoConfidence > 0.85f && sent == SentimentAnalysis.SentimentCategory.Negative)
        return ("FLAG", $"anger ({emoConfidence:P0}) + negative sentiment");

    return ("ALLOW", "no issues detected");
}

// ──────────────────────────────────────
// 4. Interactive moderation loop
// ──────────────────────────────────────
Console.WriteLine("Enter text to moderate (or 'quit' to exit):\n");

while (true)
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.Write("Text: ");
    Console.ResetColor();

    string? input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) || input.Equals("quit", StringComparison.OrdinalIgnoreCase))
        break;

    var (decision, reason) = Moderate(input);

    Console.ForegroundColor = decision switch
    {
        "BLOCK" => ConsoleColor.Red,
        "FLAG" => ConsoleColor.Yellow,
        _ => ConsoleColor.Green
    };
    Console.Write($"  {decision}");
    Console.ResetColor();
    Console.WriteLine($"  {reason}\n");
}

Step 5: Batch Moderation for Existing Content

Moderate a backlog of user content and export results:

using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

string[] messages = File.ReadAllLines("user_comments.txt");
var results = new List<string>();
results.Add("text,decision,reason");

int blocked = 0, flagged = 0, allowed = 0;

foreach (string message in messages)
{
    if (string.IsNullOrWhiteSpace(message)) continue;

    var (decision, reason) = Moderate(message);

    results.Add($"\"{message.Replace("\"", "\"\"")}\",{decision},\"{reason}\"");

    switch (decision)
    {
        case "BLOCK": blocked++; break;
        case "FLAG": flagged++; break;
        default: allowed++; break;
    }
}

File.WriteAllLines("moderation_results.csv", results);

int total = blocked + flagged + allowed;
Console.WriteLine($"Processed {total} messages:");
Console.WriteLine($"  Blocked: {blocked} ({(double)blocked / total:P0})");
Console.WriteLine($"  Flagged: {flagged} ({(double)flagged / total:P0})");
Console.WriteLine($"  Allowed: {allowed} ({(double)allowed / total:P0})");

Step 6: Multi-Label Violation Detection

Some content violates multiple policies at once. Use GetTopCategories to detect all applicable violations:

using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Define harm categories
// ──────────────────────────────────────
string[] categories =
{
    "safe",
    "toxicity",
    "harassment",
    "hate_speech",
    "spam",
    "self_harm"
};

string[] descriptions =
{
    "Normal, constructive, or neutral content that does not violate any policy",
    "Rude, disrespectful, or aggressive language intended to provoke or offend",
    "Content targeting a specific individual with threats, intimidation, or repeated unwanted contact",
    "Content attacking people based on race, ethnicity, religion, gender, sexual orientation, or disability",
    "Unsolicited commercial content, scams, or repetitive irrelevant messages",
    "Content that encourages, glorifies, or provides instructions for self-harm or suicide"
};

var categorizer = new Categorization(model)
{
    AllowUnknownCategory = false
};

string[] policies =
{
    "profanity", "sexual_content", "violence", "harassment",
    "hate_speech", "spam", "misinformation", "safe"
};

string[] policyDescriptions =
{
    "Contains vulgar or obscene language",
    "Contains sexually explicit or suggestive content",
    "Describes or promotes physical violence",
    "Targets an individual with hostility or threats",
    "Attacks a group based on protected characteristics",
    "Commercial solicitation or repetitive promotional content",
    "False or misleading claims presented as fact",
    "Normal content that does not violate any policy"
};

string text = "You people are disgusting and should all be hurt. Buy my product!";

List<int> violations = categorizer.GetTopCategories(
    policies, policyDescriptions, text, maxCategories: 3);

Console.WriteLine("Detected violations:");
foreach (int idx in violations)
{
    if (policies[idx] != "safe")
    {
        Console.WriteLine($"  {policies[idx]}");
    }
}

Step 7: Tuning Thresholds for Your Platform

Different platforms need different sensitivity levels. A children's app needs strict filtering. A professional debate forum needs more tolerance for strong opinions.

using System.Text;
using LMKit.Model;
using LMKit.TextAnalysis;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("gemma3:4b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Define harm categories
// ──────────────────────────────────────
string[] categories =
{
    "safe",
    "toxicity",
    "harassment",
    "hate_speech",
    "spam",
    "self_harm"
};

string[] descriptions =
{
    "Normal, constructive, or neutral content that does not violate any policy",
    "Rude, disrespectful, or aggressive language intended to provoke or offend",
    "Content targeting a specific individual with threats, intimidation, or repeated unwanted contact",
    "Content attacking people based on race, ethnicity, religion, gender, sexual orientation, or disability",
    "Unsolicited commercial content, scams, or repetitive irrelevant messages",
    "Content that encourages, glorifies, or provides instructions for self-harm or suicide"
};

var categorizer = new Categorization(model)
{
    AllowUnknownCategory = false
};

// Strict: children's platform or customer-facing chat
const float StrictHarmThreshold = 0.50f;
const float StrictAngerThreshold = 0.70f;

// Moderate: general community forum
const float ModerateHarmThreshold = 0.75f;
const float ModerateAngerThreshold = 0.85f;

// Permissive: adult discussion, debate platforms
const float PermissiveHarmThreshold = 0.90f;
const float PermissiveAngerThreshold = 0.95f;

// Select the profile for your platform
float harmThreshold = ModerateHarmThreshold;
float angerThreshold = ModerateAngerThreshold;

int catIndex = categorizer.GetBestCategory(harmCategories, harmDescriptions, text);
string harmType = harmCategories[catIndex];

if (harmType != "safe" && categorizer.Confidence > harmThreshold)
{
    Console.WriteLine("Content blocked.");
}

Measure your false positive rate by running a batch of known-safe content through the pipeline. If more than 2% of safe content gets blocked, lower your thresholds. If harmful content slips through, raise them.

Model Selection for Content Moderation

Model ID	VRAM	Speed	Accuracy	Best For
`gemma3:1b`	~1.5 GB	Fastest	Good	High-volume pre-screening
`gemma3:4b`	~3.5 GB	Fast	Very good	General moderation (recommended)
`qwen3.5:4b`	~3.5 GB	Fast	Very good	Multilingual communities
`gemma3:12b`	~8 GB	Moderate	Excellent	Nuanced content (sarcasm, context)

For most moderation tasks, gemma3:4b balances speed and accuracy well. Use a 1B model as a fast pre-filter and escalate uncertain cases to a larger model.

Common Issues

Problem	Cause	Fix
Too many false positives	Thresholds too strict	Raise `harmThreshold`; test with known-safe content
Sarcastic insults pass through	Sarcasm detector not included	Add `SarcasmDetection` as a signal; combine with sentiment
Slow on high-volume streams	Running all 4 classifiers per message	Use category check first; only run full pipeline if `harmType != "safe"`
Wrong category for short messages	Insufficient context	Add `categorizer.Guidance` with platform context; use larger model
Spam not detected	Spam patterns differ by platform	Add platform-specific spam descriptions; include URL patterns in guidance

Next Steps

Analyze Customer Sentiment at Scale: deeper dive into sentiment and emotion detection.
Classify and Extract Data from Documents: build classification pipelines for document processing.
Samples: Sentiment Analysis: sentiment analysis demo.
Samples: Emotion Detection: emotion detection demo.
Samples: Sarcasm Detection: sarcasm detection demo.

Table of Contents