Table of Contents

Understanding Classification in LM-Kit.NET


TL;DR

Classification is the task of assigning one or more predefined labels to a piece of content, whether that content is text, a document, or an image. In LM-Kit.NET, classification is powered by three main classes in the LMKit.TextAnalysis namespace: Categorization for general-purpose multi-class labeling, SentimentAnalysis for polarity detection, and EmotionDetection for fine-grained emotion recognition. All three support embedding-based and completion-based classification, accept multimodal inputs (text, images, PDFs, Office documents), and run entirely on-device for maximum privacy.


What is Classification?

Definition: Classification is a supervised-learning concept in which a model maps an input to one of a finite set of categories. In the context of language models, classification leverages the model's semantic understanding to determine the most appropriate label for a given piece of content, without requiring traditional training on labeled datasets.

The Classification Pipeline

+--------------------------------------------------------------------------+
|                     Classification Pipeline                               |
+--------------------------------------------------------------------------+
|                                                                          |
|  +-----------+     +----------------+     +----------------+             |
|  |  Input    |     |  LM-Kit.NET    |     |  Output        |             |
|  |           |---->|  Classifier    |---->|                |             |
|  | • Text    |     |                |     | • Category     |             |
|  | • Image   |     | • Categorize   |     | • Confidence   |             |
|  | • PDF     |     | • Sentiment    |     | • Top-N labels |             |
|  | • Office  |     | • Emotion      |     |                |             |
|  +-----------+     +----------------+     +----------------+             |
|                                                                          |
+--------------------------------------------------------------------------+

Classification vs Generation

Aspect Text Generation Classification
Output Free-form text Discrete label(s)
Determinism Variable Highly deterministic
Use Case Creative writing, chat Routing, filtering, analytics
Validation Open-ended Constrained to known labels

Types of Classification

1. Single-Label Classification

The most common form: assign exactly one category to each input. Examples include spam detection, language identification, and document routing.

2. Multi-Label Classification

Assign multiple categories to a single input. For instance, a news article might be labeled both "Technology" and "Finance." In LM-Kit.NET, the GetTopCategories method supports this pattern by returning the top N matching labels.

3. Sentiment Analysis

A specialized form of classification that determines the emotional polarity of text: Positive, Negative, or Neutral. Sentiment analysis is widely used in customer feedback processing, social media monitoring, and brand reputation tracking.

4. Emotion Detection

A finer-grained classification that identifies specific emotions: Happiness, Anger, Sadness, Fear, or Neutral. This enables nuanced understanding of user intent in support tickets, reviews, and conversational interfaces.


How LM-Kit.NET Implements Classification

LM-Kit.NET provides three dedicated classification classes, each optimized for its domain:

Architecture

+--------------------------------------------------------------------------+
|                  LM-Kit.NET Classification Architecture                  |
+--------------------------------------------------------------------------+
|                                                                          |
|  +-------------------------------------------------------------------+   |
|  |                      Input Layer                                  |   |
|  |  Text • Attachment • ImageBuffer • PDF • Office • HTML            |   |
|  +-------------------------------------------------------------------+   |
|                                  |                                       |
|                                  v                                       |
|  +-------------------------------------------------------------------+   |
|  |                   Classification Engine                           |   |
|  |                                                                   |   |
|  |  +------------------+ +------------------+ +-------------------+  |   |
|  |  |  Categorization  | | SentimentAnalysis| | EmotionDetection  |  |   |
|  |  |                  | |                  | |                   |  |   |
|  |  | • Custom labels  | | • Positive       | | • Happiness       |  |   |
|  |  | • Descriptions   | | • Negative       | | • Anger           |  |   |
|  |  | • Top-N results  | | • Neutral        | | • Sadness         |  |   |
|  |  | • Vision input   | |                  | | • Fear            |  |   |
|  |  +------------------+ +------------------+ +-------------------+  |   |
|  |                                                                   |   |
|  |  Modes: Completion-based  |  Embedding-based                      |   |
|  +-------------------------------------------------------------------+   |
|                                  |                                       |
|                                  v                                       |
|  +-------------------------------------------------------------------+   |
|  |                      Output Layer                                 |   |
|  |  Category Index • Confidence Score • Label Name                   |   |
|  +-------------------------------------------------------------------+   |
|                                                                          |
+--------------------------------------------------------------------------+

General-Purpose Classification with Categorization

using LMKit.TextAnalysis;
using LMKit.Model;

var model = LM.LoadFromModelID("gemma3:4b");

var categorizer = new Categorization(model);

// Classify customer support tickets
var categories = new List<string>
{
    "Billing Issue",
    "Technical Problem",
    "Feature Request",
    "Account Access",
    "General Inquiry"
};

string ticket = "I can't log into my account after resetting my password.";
int bestIndex = categorizer.GetBestCategory(categories, ticket, cancellationToken: CancellationToken.None);

Console.WriteLine($"Category: {categories[bestIndex]}");
Console.WriteLine($"Confidence: {categorizer.Confidence:P1}");
// Output: Category: Account Access
// Output: Confidence: 94.2%

Classification with Descriptions

Adding category descriptions improves accuracy for ambiguous labels:

var categories = new List<string> { "Bug", "Enhancement", "Documentation" };
var descriptions = new List<string>
{
    "Software defect causing unexpected behavior",
    "Request for new functionality or improvement",
    "Missing, unclear, or incorrect documentation"
};

int index = categorizer.GetBestCategory(
    categories, descriptions,
    "The API returns a 500 error when sending a POST request with an empty body.",
    cancellationToken: CancellationToken.None
);

Multi-Label Classification

// Get top 3 matching categories
var topCategories = categorizer.GetTopCategories(
    categories, content,
    maxCategories: 3,
    cancellationToken: CancellationToken.None
);

foreach (int idx in topCategories)
{
    Console.WriteLine($"  {categories[idx]}");
}

Document Classification with Vision

using LMKit.TextAnalysis;
using LMKit.Data;
using LMKit.Model;

var model = LM.LoadFromModelID("gemma3:12b");

var categorizer = new Categorization(model);
categorizer.PreferredInferenceModality = InferenceModality.Vision;

var categories = new List<string>
{
    "Invoice",
    "Contract",
    "Resume",
    "Receipt",
    "ID Document"
};

// Classify a scanned document by its visual appearance
int result = categorizer.GetBestCategory(
    categories,
    new Attachment("scanned_document.pdf"),
    cancellationToken: CancellationToken.None
);

Console.WriteLine($"Document type: {categories[result]} ({categorizer.Confidence:P1})");

Sentiment Analysis

using LMKit.TextAnalysis;
using LMKit.Model;

var model = LM.LoadFromModelID("gemma3:4b");

var sentiment = new SentimentAnalysis(model);
sentiment.NeutralSupport = true; // Enable three-way classification

var category = sentiment.GetSentimentCategory(
    "The product arrived on time and works perfectly. Very happy with the purchase!",
    CancellationToken.None
);

Console.WriteLine($"Sentiment: {category} ({sentiment.Confidence:P1})");
// Output: Sentiment: Positive (97.3%)

Emotion Detection

using LMKit.TextAnalysis;
using LMKit.Model;

var model = LM.LoadFromModelID("qwen3:4b");

var emotion = new EmotionDetection(model);

var category = emotion.GetEmotionCategory(
    "I've been waiting three weeks for a response and nobody seems to care.",
    CancellationToken.None
);

Console.WriteLine($"Emotion: {category} ({emotion.Confidence:P1})");
// Output: Emotion: Anger (89.7%)

Embedding-Based Classification

All three classifiers support an embedding-based mode that uses vector similarity rather than text completion. This can be faster for large category sets:

var categorizer = new Categorization(model);
categorizer.UseEmbeddingClassifier = true;

// Embedding mode compares input vector against category vectors
int index = categorizer.GetBestCategory(categories, content, cancellationToken: CancellationToken.None);

Classification Strategies Compared

Strategy Accuracy Speed Best For
Completion-based Highest Moderate Complex or nuanced categories
Embedding-based High Fast Large category sets, simple labels
With descriptions Highest Moderate Ambiguous or overlapping categories
Vision mode High Slower Scanned documents, images

Classification Use Cases

1. Customer Support Routing

Automatically route incoming tickets to the right department by classifying the topic, urgency, and sentiment.

2. Content Moderation

Categorize user-generated content to flag inappropriate material, spam, or policy violations.

3. Document Triage

Classify incoming documents (invoices, contracts, forms, reports) and route them to the appropriate processing pipeline. Pairs naturally with Intelligent Document Processing (IDP).

4. Brand Monitoring

Track sentiment across reviews, social media posts, and feedback channels to measure customer satisfaction over time.

5. Email Prioritization

Classify emails by intent (inquiry, complaint, order, spam) and urgency to surface the most important messages first.

6. Research Categorization

Tag academic papers, articles, or internal knowledge-base documents with topic labels for organized retrieval.


Key Terms

  • Classification: Assigning discrete labels to content based on its meaning
  • Multi-class Classification: Choosing one label from many possible categories
  • Multi-label Classification: Assigning multiple labels simultaneously to a single input
  • Sentiment Analysis: Classifying text polarity as positive, negative, or neutral
  • Emotion Detection: Identifying specific emotions (happiness, anger, sadness, fear)
  • Confidence Score: A value between 0 and 1 indicating the model's certainty in its classification
  • Embedding Classifier: A classification mode using vector similarity instead of text generation
  • Category Description: Optional explanatory text that helps the model disambiguate similar categories



External Resources


Summary

Classification is the process of assigning predefined labels to content based on semantic understanding. In LM-Kit.NET, three dedicated classes cover the full classification spectrum: Categorization for custom multi-class labeling with optional descriptions, SentimentAnalysis for polarity detection, and EmotionDetection for fine-grained emotion recognition. All classifiers support both completion-based and embedding-based modes, accept multimodal inputs (text, images, PDFs, Office documents), and produce results with confidence scores for reliable automation. Combined with vision capabilities for document classification and the ability to return multiple top categories, LM-Kit.NET provides a complete on-device classification toolkit for customer support routing, content moderation, document triage, and sentiment monitoring.