👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/batch_document_classification
Batch Document Classification with AI in .NET Applications
🎯 Purpose of the Sample
Batch Document Classification demonstrates how to classify large volumes of heterogeneous documents using LM-Kit.NET. The sample processes files in parallel, assigns each document to the most relevant category, and automatically organizes outputs into category-based folders.
It supports images, PDFs, Office documents, text, and HTML files, making it suitable for real-world document pipelines.
👥 Target Audience
- Enterprise & B2B Apps – automate document intake and routing
- Back-office & Ops – sort incoming documents at scale
- Compliance & Archiving – pre-classify documents before review
- Demo & Benchmarking – measure throughput and confidence at scale
🚀 Problem Solved
- Manual sorting of mixed document folders
- Scalable classification with configurable parallelism
- Consistent taxonomy across thousands of files
- Automated organization of outputs by category
💻 Sample Application Description
Console application that:
- Loads a local LM-Kit classification model.
- Scans an input directory recursively.
- Filters supported file types (images, PDFs, Office docs, text).
- Classifies each document into a predefined category list.
- Runs in parallel with configurable thread count.
- Copies files into category-based output folders.
- Displays real-time progress, confidence, and performance metrics.
📂 Supported File Types
- Images: PNG, JPG, JPEG, TIFF, WEBP, BMP, GIF, PSD, HDR, TGA
- Documents: PDF, DOCX, XLSX, PPTX
- Text: TXT, HTML
🏷️ Supported Categories
Examples include:
- Invoice, Receipt, Purchase Order
- Contract, Letter, Resume
- Bank Statement, Utility Bill, Pay Stub
- Passport, ID Card, Driver License
- Medical Record, Insurance Policy
- Shipping Document, Shipping Label
- Unknown (fallback)
⚙️ Key Features
- ⚡ Parallel Processing – configurable number of threads
- 📁 Auto-Sorting – output folders per detected category
- 🧠 Confidence Scoring – per-document confidence value
- 📊 Live Metrics – throughput, average latency, docs per second
- 🧩 Mixed Inputs – images and documents handled uniformly
- 🔁 Thread-Safe Design – shared model, per-thread categorizer
🛠️ Getting Started
📋 Prerequisites
- .NET Framework 4.6.2 or .NET 8.0+
- LM-Kit.NET license key
📥 Download
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/batch_document_classification
▶️ Run
dotnet build
dotnet run
You will be prompted for:
- Input folder containing documents
- Output folder for classified files
- Number of processing threads
📈 Runtime Output
During execution, the console displays:
- File name and detected category
- Confidence score
- Per-document processing time
- Global progress and average latency
At completion:
- Total documents processed
- Documents per second
- Average confidence
- Error count (if any)
🔍 Notes
- The model is loaded once and shared across threads.
- Each worker thread uses its own
Categorizationinstance. - Unknown or ambiguous documents are routed to the unknown category.
- Output file names are auto-deduplicated.
🔧 Extend the Demo
- Customize the category taxonomy.
- Persist results to a database instead of folders.
- Add confidence thresholds for rejection or review queues.
- Integrate OCR preprocessing for scanned documents.
- Combine with RAG or extraction pipelines for downstream processing.