Understanding Classification in LM-Kit.NET
TL;DR
Classification is the task of assigning one or more predefined labels to a piece of content, whether that content is text, a document, or an image. In LM-Kit.NET, classification is powered by three main classes in the LMKit.TextAnalysis namespace: Categorization for general-purpose multi-class labeling, SentimentAnalysis for polarity detection, and EmotionDetection for fine-grained emotion recognition. All three support embedding-based and completion-based classification, accept multimodal inputs (text, images, PDFs, Office documents), and run entirely on-device for maximum privacy.
What is Classification?
Definition: Classification is a supervised-learning concept in which a model maps an input to one of a finite set of categories. In the context of language models, classification leverages the model's semantic understanding to determine the most appropriate label for a given piece of content, without requiring traditional training on labeled datasets.
The Classification Pipeline
+--------------------------------------------------------------------------+
| Classification Pipeline |
+--------------------------------------------------------------------------+
| |
| +-----------+ +----------------+ +----------------+ |
| | Input | | LM-Kit.NET | | Output | |
| | |---->| Classifier |---->| | |
| | • Text | | | | • Category | |
| | • Image | | • Categorize | | • Confidence | |
| | • PDF | | • Sentiment | | • Top-N labels | |
| | • Office | | • Emotion | | | |
| +-----------+ +----------------+ +----------------+ |
| |
+--------------------------------------------------------------------------+
Classification vs Generation
| Aspect | Text Generation | Classification |
|---|---|---|
| Output | Free-form text | Discrete label(s) |
| Determinism | Variable | Highly deterministic |
| Use Case | Creative writing, chat | Routing, filtering, analytics |
| Validation | Open-ended | Constrained to known labels |
Types of Classification
1. Single-Label Classification
The most common form: assign exactly one category to each input. Examples include spam detection, language identification, and document routing.
2. Multi-Label Classification
Assign multiple categories to a single input. For instance, a news article might be labeled both "Technology" and "Finance." In LM-Kit.NET, the GetTopCategories method supports this pattern by returning the top N matching labels.
3. Sentiment Analysis
A specialized form of classification that determines the emotional polarity of text: Positive, Negative, or Neutral. Sentiment analysis is widely used in customer feedback processing, social media monitoring, and brand reputation tracking.
4. Emotion Detection
A finer-grained classification that identifies specific emotions: Happiness, Anger, Sadness, Fear, or Neutral. This enables nuanced understanding of user intent in support tickets, reviews, and conversational interfaces.
How LM-Kit.NET Implements Classification
LM-Kit.NET provides three dedicated classification classes, each optimized for its domain:
Architecture
+--------------------------------------------------------------------------+
| LM-Kit.NET Classification Architecture |
+--------------------------------------------------------------------------+
| |
| +-------------------------------------------------------------------+ |
| | Input Layer | |
| | Text • Attachment • ImageBuffer • PDF • Office • HTML | |
| +-------------------------------------------------------------------+ |
| | |
| v |
| +-------------------------------------------------------------------+ |
| | Classification Engine | |
| | | |
| | +------------------+ +------------------+ +-------------------+ | |
| | | Categorization | | SentimentAnalysis| | EmotionDetection | | |
| | | | | | | | | |
| | | • Custom labels | | • Positive | | • Happiness | | |
| | | • Descriptions | | • Negative | | • Anger | | |
| | | • Top-N results | | • Neutral | | • Sadness | | |
| | | • Vision input | | | | • Fear | | |
| | +------------------+ +------------------+ +-------------------+ | |
| | | |
| | Modes: Completion-based | Embedding-based | |
| +-------------------------------------------------------------------+ |
| | |
| v |
| +-------------------------------------------------------------------+ |
| | Output Layer | |
| | Category Index • Confidence Score • Label Name | |
| +-------------------------------------------------------------------+ |
| |
+--------------------------------------------------------------------------+
General-Purpose Classification with Categorization
using LMKit.TextAnalysis;
using LMKit.Model;
var model = LM.LoadFromModelID("gemma3:4b");
var categorizer = new Categorization(model);
// Classify customer support tickets
var categories = new List<string>
{
"Billing Issue",
"Technical Problem",
"Feature Request",
"Account Access",
"General Inquiry"
};
string ticket = "I can't log into my account after resetting my password.";
int bestIndex = categorizer.GetBestCategory(categories, ticket, cancellationToken: CancellationToken.None);
Console.WriteLine($"Category: {categories[bestIndex]}");
Console.WriteLine($"Confidence: {categorizer.Confidence:P1}");
// Output: Category: Account Access
// Output: Confidence: 94.2%
Classification with Descriptions
Adding category descriptions improves accuracy for ambiguous labels:
var categories = new List<string> { "Bug", "Enhancement", "Documentation" };
var descriptions = new List<string>
{
"Software defect causing unexpected behavior",
"Request for new functionality or improvement",
"Missing, unclear, or incorrect documentation"
};
int index = categorizer.GetBestCategory(
categories, descriptions,
"The API returns a 500 error when sending a POST request with an empty body.",
cancellationToken: CancellationToken.None
);
Multi-Label Classification
// Get top 3 matching categories
var topCategories = categorizer.GetTopCategories(
categories, content,
maxCategories: 3,
cancellationToken: CancellationToken.None
);
foreach (int idx in topCategories)
{
Console.WriteLine($" {categories[idx]}");
}
Document Classification with Vision
using LMKit.TextAnalysis;
using LMKit.Data;
using LMKit.Model;
var model = LM.LoadFromModelID("gemma3:12b");
var categorizer = new Categorization(model);
categorizer.PreferredInferenceModality = InferenceModality.Vision;
var categories = new List<string>
{
"Invoice",
"Contract",
"Resume",
"Receipt",
"ID Document"
};
// Classify a scanned document by its visual appearance
int result = categorizer.GetBestCategory(
categories,
new Attachment("scanned_document.pdf"),
cancellationToken: CancellationToken.None
);
Console.WriteLine($"Document type: {categories[result]} ({categorizer.Confidence:P1})");
Sentiment Analysis
using LMKit.TextAnalysis;
using LMKit.Model;
var model = LM.LoadFromModelID("gemma3:4b");
var sentiment = new SentimentAnalysis(model);
sentiment.NeutralSupport = true; // Enable three-way classification
var category = sentiment.GetSentimentCategory(
"The product arrived on time and works perfectly. Very happy with the purchase!",
CancellationToken.None
);
Console.WriteLine($"Sentiment: {category} ({sentiment.Confidence:P1})");
// Output: Sentiment: Positive (97.3%)
Emotion Detection
using LMKit.TextAnalysis;
using LMKit.Model;
var model = LM.LoadFromModelID("qwen3:4b");
var emotion = new EmotionDetection(model);
var category = emotion.GetEmotionCategory(
"I've been waiting three weeks for a response and nobody seems to care.",
CancellationToken.None
);
Console.WriteLine($"Emotion: {category} ({emotion.Confidence:P1})");
// Output: Emotion: Anger (89.7%)
Embedding-Based Classification
All three classifiers support an embedding-based mode that uses vector similarity rather than text completion. This can be faster for large category sets:
var categorizer = new Categorization(model);
categorizer.UseEmbeddingClassifier = true;
// Embedding mode compares input vector against category vectors
int index = categorizer.GetBestCategory(categories, content, cancellationToken: CancellationToken.None);
Classification Strategies Compared
| Strategy | Accuracy | Speed | Best For |
|---|---|---|---|
| Completion-based | Highest | Moderate | Complex or nuanced categories |
| Embedding-based | High | Fast | Large category sets, simple labels |
| With descriptions | Highest | Moderate | Ambiguous or overlapping categories |
| Vision mode | High | Slower | Scanned documents, images |
Classification Use Cases
1. Customer Support Routing
Automatically route incoming tickets to the right department by classifying the topic, urgency, and sentiment.
2. Content Moderation
Categorize user-generated content to flag inappropriate material, spam, or policy violations.
3. Document Triage
Classify incoming documents (invoices, contracts, forms, reports) and route them to the appropriate processing pipeline. Pairs naturally with Intelligent Document Processing (IDP).
4. Brand Monitoring
Track sentiment across reviews, social media posts, and feedback channels to measure customer satisfaction over time.
5. Email Prioritization
Classify emails by intent (inquiry, complaint, order, spam) and urgency to surface the most important messages first.
6. Research Categorization
Tag academic papers, articles, or internal knowledge-base documents with topic labels for organized retrieval.
Key Terms
- Classification: Assigning discrete labels to content based on its meaning
- Multi-class Classification: Choosing one label from many possible categories
- Multi-label Classification: Assigning multiple labels simultaneously to a single input
- Sentiment Analysis: Classifying text polarity as positive, negative, or neutral
- Emotion Detection: Identifying specific emotions (happiness, anger, sadness, fear)
- Confidence Score: A value between 0 and 1 indicating the model's certainty in its classification
- Embedding Classifier: A classification mode using vector similarity instead of text generation
- Category Description: Optional explanatory text that helps the model disambiguate similar categories
Related API Documentation
Categorization: General-purpose multi-class classificationSentimentAnalysis: Polarity detection (positive, negative, neutral)EmotionDetection: Fine-grained emotion recognitionAttachment: Universal document input for classificationInferenceModality: Processing mode (Text, Vision, Multimodal)
Related Glossary Topics
- Structured Data Extraction: Extracting typed fields from content after classification
- Named Entity Recognition (NER): Identifying entities within text
- Extraction: Broader overview of all extraction capabilities
- Embeddings: Vector representations used in embedding-based classification
- Intelligent Document Processing (IDP): End-to-end document automation including classification
- Vision Language Models (VLM): Multimodal models for image-based classification
- Prompt Engineering: Crafting guidance to improve classification accuracy
- Fine-Tuning: Training models on domain-specific classification data
- LLM: Large Language Models powering classification
- Inference: Model execution process for classification tasks
- Dynamic Sampling: Neuro-symbolic framework ensuring reliable classification outputs
- RAG (Retrieval-Augmented Generation): Combining retrieval with classification for knowledge-aware routing
External Resources
- LM-Kit Custom Classification Demo: Custom category classification example
- LM-Kit Sentiment Analysis Demo: Sentiment detection example
- LM-Kit Emotion Detection Demo: Emotion classification example
- LM-Kit Document Classification Demo: Document type classification example
Summary
Classification is the process of assigning predefined labels to content based on semantic understanding. In LM-Kit.NET, three dedicated classes cover the full classification spectrum: Categorization for custom multi-class labeling with optional descriptions, SentimentAnalysis for polarity detection, and EmotionDetection for fine-grained emotion recognition. All classifiers support both completion-based and embedding-based modes, accept multimodal inputs (text, images, PDFs, Office documents), and produce results with confidence scores for reliable automation. Combined with vision capabilities for document classification and the ability to return multiple top categories, LM-Kit.NET provides a complete on-device classification toolkit for customer support routing, content moderation, document triage, and sentiment monitoring.