Understanding Classification in LM-Kit.NET

TL;DR

Classification is the task of assigning one or more predefined labels to a piece of content, whether that content is text, a document, or an image. In LM-Kit.NET, classification is powered by three main classes in the LMKit.TextAnalysis namespace: Categorization for general-purpose multi-class labeling, SentimentAnalysis for polarity detection, and EmotionDetection for fine-grained emotion recognition. All three support embedding-based and completion-based classification, accept multimodal inputs (text, images, PDFs, Office documents), and run entirely on-device for maximum privacy.

What is Classification?

Definition: Classification is a supervised-learning concept in which a model maps an input to one of a finite set of categories. In the context of language models, classification leverages the model's semantic understanding to determine the most appropriate label for a given piece of content, without requiring traditional training on labeled datasets.

The Classification Pipeline

+--------------------------------------------------------------------------+
|                     Classification Pipeline                               |
+--------------------------------------------------------------------------+
|                                                                          |
|  +-----------+     +----------------+     +----------------+             |
|  |  Input    |     |  LM-Kit.NET    |     |  Output        |             |
|  |           |---->|  Classifier    |---->|                |             |
|  | • Text    |     |                |     | • Category     |             |
|  | • Image   |     | • Categorize   |     | • Confidence   |             |
|  | • PDF     |     | • Sentiment    |     | • Top-N labels |             |
|  | • Office  |     | • Emotion      |     |                |             |
|  +-----------+     +----------------+     +----------------+             |
|                                                                          |
+--------------------------------------------------------------------------+

Classification vs Generation

Aspect	Text Generation	Classification
Output	Free-form text	Discrete label(s)
Determinism	Variable	Highly deterministic
Use Case	Creative writing, chat	Routing, filtering, analytics
Validation	Open-ended	Constrained to known labels

Types of Classification

1. Single-Label Classification

The most common form: assign exactly one category to each input. Examples include spam detection, language identification, and document routing.

2. Multi-Label Classification

Assign multiple categories to a single input. For instance, a news article might be labeled both "Technology" and "Finance." In LM-Kit.NET, the GetTopCategories method supports this pattern by returning the top N matching labels.

3. Sentiment Analysis

A specialized form of classification that determines the emotional polarity of text: Positive, Negative, or Neutral. Sentiment analysis is widely used in customer feedback processing, social media monitoring, and brand reputation tracking.

4. Emotion Detection

A finer-grained classification that identifies specific emotions: Happiness, Anger, Sadness, Fear, or Neutral. This enables nuanced understanding of user intent in support tickets, reviews, and conversational interfaces.

How LM-Kit.NET Implements Classification

LM-Kit.NET provides three dedicated classification classes, each optimized for its domain:

Architecture

+--------------------------------------------------------------------------+
|                  LM-Kit.NET Classification Architecture                  |
+--------------------------------------------------------------------------+
|                                                                          |
|  +-------------------------------------------------------------------+   |
|  |                      Input Layer                                  |   |
|  |  Text • Attachment • ImageBuffer • PDF • Office • HTML            |   |
|  +-------------------------------------------------------------------+   |
|                                  |                                       |
|                                  v                                       |
|  +-------------------------------------------------------------------+   |
|  |                   Classification Engine                           |   |
|  |                                                                   |   |
|  |  +------------------+ +------------------+ +-------------------+  |   |
|  |  |  Categorization  | | SentimentAnalysis| | EmotionDetection  |  |   |
|  |  |                  | |                  | |                   |  |   |
|  |  | • Custom labels  | | • Positive       | | • Happiness       |  |   |
|  |  | • Descriptions   | | • Negative       | | • Anger           |  |   |
|  |  | • Top-N results  | | • Neutral        | | • Sadness         |  |   |
|  |  | • Vision input   | |                  | | • Fear            |  |   |
|  |  +------------------+ +------------------+ +-------------------+  |   |
|  |                                                                   |   |
|  |  Modes: Completion-based  |  Embedding-based                      |   |
|  +-------------------------------------------------------------------+   |
|                                  |                                       |
|                                  v                                       |
|  +-------------------------------------------------------------------+   |
|  |                      Output Layer                                 |   |
|  |  Category Index • Confidence Score • Label Name                   |   |
|  +-------------------------------------------------------------------+   |
|                                                                          |
+--------------------------------------------------------------------------+

General-Purpose Classification with Categorization

using LMKit.TextAnalysis;
using LMKit.Model;

var model = LM.LoadFromModelID("gemma3:4b");

var categorizer = new Categorization(model);

// Classify customer support tickets
var categories = new List<string>
{
    "Billing Issue",
    "Technical Problem",
    "Feature Request",
    "Account Access",
    "General Inquiry"
};

string ticket = "I can't log into my account after resetting my password.";
int bestIndex = categorizer.GetBestCategory(categories, ticket, cancellationToken: CancellationToken.None);

Console.WriteLine($"Category: {categories[bestIndex]}");
Console.WriteLine($"Confidence: {categorizer.Confidence:P1}");
// Output: Category: Account Access
// Output: Confidence: 94.2%

Classification with Descriptions

Adding category descriptions improves accuracy for ambiguous labels:

var categories = new List<string> { "Bug", "Enhancement", "Documentation" };
var descriptions = new List<string>
{
    "Software defect causing unexpected behavior",
    "Request for new functionality or improvement",
    "Missing, unclear, or incorrect documentation"
};

int index = categorizer.GetBestCategory(
    categories, descriptions,
    "The API returns a 500 error when sending a POST request with an empty body.",
    cancellationToken: CancellationToken.None
);

Multi-Label Classification

// Get top 3 matching categories
var topCategories = categorizer.GetTopCategories(
    categories, content,
    maxCategories: 3,
    cancellationToken: CancellationToken.None
);

foreach (int idx in topCategories)
{
    Console.WriteLine($"  {categories[idx]}");
}

Document Classification with Vision

using LMKit.TextAnalysis;
using LMKit.Data;
using LMKit.Model;

var model = LM.LoadFromModelID("gemma3:12b");

var categorizer = new Categorization(model);
categorizer.PreferredInferenceModality = InferenceModality.Vision;

var categories = new List<string>
{
    "Invoice",
    "Contract",
    "Resume",
    "Receipt",
    "ID Document"
};

// Classify a scanned document by its visual appearance
int result = categorizer.GetBestCategory(
    categories,
    new Attachment("scanned_document.pdf"),
    cancellationToken: CancellationToken.None
);

Console.WriteLine($"Document type: {categories[result]} ({categorizer.Confidence:P1})");

Sentiment Analysis

using LMKit.TextAnalysis;
using LMKit.Model;

var model = LM.LoadFromModelID("gemma3:4b");

var sentiment = new SentimentAnalysis(model);
sentiment.NeutralSupport = true; // Enable three-way classification

var category = sentiment.GetSentimentCategory(
    "The product arrived on time and works perfectly. Very happy with the purchase!",
    CancellationToken.None
);

Console.WriteLine($"Sentiment: {category} ({sentiment.Confidence:P1})");
// Output: Sentiment: Positive (97.3%)

Emotion Detection

using LMKit.TextAnalysis;
using LMKit.Model;

var model = LM.LoadFromModelID("qwen3.5:4b");

var emotion = new EmotionDetection(model);

var category = emotion.GetEmotionCategory(
    "I've been waiting three weeks for a response and nobody seems to care.",
    CancellationToken.None
);

Console.WriteLine($"Emotion: {category} ({emotion.Confidence:P1})");
// Output: Emotion: Anger (89.7%)

Embedding-Based Classification

All three classifiers support an embedding-based mode that uses vector similarity rather than text completion. This can be faster for large category sets:

var categorizer = new Categorization(model);
categorizer.UseEmbeddingClassifier = true;

// Embedding mode compares input vector against category vectors
int index = categorizer.GetBestCategory(categories, content, cancellationToken: CancellationToken.None);

Classification Strategies Compared

Strategy	Accuracy	Speed	Best For
Completion-based	Highest	Moderate	Complex or nuanced categories
Embedding-based	High	Fast	Large category sets, simple labels
With descriptions	Highest	Moderate	Ambiguous or overlapping categories
Vision mode	High	Slower	Scanned documents, images

Classification Use Cases

1. Customer Support Routing

Automatically route incoming tickets to the right department by classifying the topic, urgency, and sentiment.

2. Content Moderation

Categorize user-generated content to flag inappropriate material, spam, or policy violations.

3. Document Triage

Classify incoming documents (invoices, contracts, forms, reports) and route them to the appropriate processing pipeline. Pairs naturally with Intelligent Document Processing (IDP).

4. Brand Monitoring

Track sentiment across reviews, social media posts, and feedback channels to measure customer satisfaction over time.

5. Email Prioritization

Classify emails by intent (inquiry, complaint, order, spam) and urgency to surface the most important messages first.

6. Research Categorization

Tag academic papers, articles, or internal knowledge-base documents with topic labels for organized retrieval.

Key Terms

Classification: Assigning discrete labels to content based on its meaning
Multi-class Classification: Choosing one label from many possible categories
Multi-label Classification: Assigning multiple labels simultaneously to a single input
Sentiment Analysis: Classifying text polarity as positive, negative, or neutral
Emotion Detection: Identifying specific emotions (happiness, anger, sadness, fear)
Confidence Score: A value between 0 and 1 indicating the model's certainty in its classification
Embedding Classifier: A classification mode using vector similarity instead of text generation
Category Description: Optional explanatory text that helps the model disambiguate similar categories

Categorization: General-purpose multi-class classification
SentimentAnalysis: Polarity detection (positive, negative, neutral)
EmotionDetection: Fine-grained emotion recognition
Attachment: Universal document input for classification
InferenceModality: Processing mode (Text, Vision, Multimodal)

Structured Data Extraction: Extracting typed fields from content after classification
Named Entity Recognition (NER): Identifying entities within text
Extraction: Broader overview of all extraction capabilities
Embeddings: Vector representations used in embedding-based classification
Intelligent Document Processing (IDP): End-to-end document automation including classification
Vision Language Models (VLM): Multimodal models for image-based classification
Prompt Engineering: Crafting guidance to improve classification accuracy
Fine-Tuning: Training models on domain-specific classification data
LLM: Large Language Models powering classification
Inference: Model execution process for classification tasks
Dynamic Sampling: Neuro-symbolic framework ensuring reliable classification outputs
RAG (Retrieval-Augmented Generation): Combining retrieval with classification for knowledge-aware routing

External Resources

LM-Kit Custom Classification Demo: Custom category classification example
LM-Kit Sentiment Analysis Demo: Sentiment detection example
LM-Kit Emotion Detection Demo: Emotion classification example
LM-Kit Document Classification Demo: Document type classification example

Summary

Classification is the process of assigning predefined labels to content based on semantic understanding. In LM-Kit.NET, three dedicated classes cover the full classification spectrum: Categorization for custom multi-class labeling with optional descriptions, SentimentAnalysis for polarity detection, and EmotionDetection for fine-grained emotion recognition. All classifiers support both completion-based and embedding-based modes, accept multimodal inputs (text, images, PDFs, Office documents), and produce results with confidence scores for reliable automation. Combined with vision capabilities for document classification and the ability to return multiple top categories, LM-Kit.NET provides a complete on-device classification toolkit for customer support routing, content moderation, document triage, and sentiment monitoring.

Table of Contents