What OCR Options Does LM-Kit.NET Provide?
TL;DR
LM-Kit.NET offers two OCR approaches: VLM OCR uses vision language models for high-accuracy recognition with semantic understanding of complex layouts, tables, formulas, and charts. LM-Kit OCR is a high-throughput engine engineered for speed and very high accuracy on business documents, with advanced page layout handling. Choose VLM OCR for documents requiring semantic understanding (formulas, charts) and LM-Kit OCR for high-volume business document processing with maximum throughput.
VLM OCR vs LM-Kit OCR
| Feature | VLM OCR | LM-Kit OCR |
|---|---|---|
| Engine | Vision language model (AI) | High-throughput OCR engine |
| Complex layouts | Excellent (multi-column, nested) | Advanced (multi-column, business documents) |
| Table extraction | Preserves rows and columns | Limited |
| Formula recognition | Mathematical formulas and equations | Not supported |
| Chart interpretation | Extracts data from visualizations | Not supported |
| Output formats | Plain text, Markdown, coordinates | Plain text with word-level bounding boxes |
| Speed | Slower (model inference) | Very fast, optimized for high throughput |
| Memory | Requires loading a VLM model | Lightweight |
| Preprocessing | Handled by the model | Automatic deskew, rotation detection |
| Languages | Multilingual via model capabilities | Language-specific dictionaries |
VLM OCR Models
| Model | Size | Strengths |
|---|---|---|
paddleocr-vl:0.9b |
Ultra-compact (0.9B params) | Six task modes including coordinates and seal recognition |
glm-ocr |
Ultra-compact (0.9B params) | Document parsing, text, formula, table, and complex layout recognition across multiple languages |
lightonocr1025:1b |
Compact (1B params) | Layout-aware text extraction, Markdown output |
lightonocr-2:1b |
Compact (1B params) | RLVR-refined for accuracy, tables, receipts, forms, math |
glm-4.6v-flash |
10B | Vision + OCR + chat + tool calling. Strong at documents, screenshots, charts, tables. 131K context |
General-purpose vision models with explicit OCR capability include the Qwen 3.5 family (qwen3.5:4b, qwen3.5:9b, qwen3.5:27b, qwen3.5:35b-a3b), the latest Qwen 3.6 family (qwen3.6:27b, qwen3.6:35b-a3b), the Qwen 3 VL family (qwen3-vl:2b, qwen3-vl:4b, qwen3-vl:8b, qwen3-vl:30b), and minicpm-v-45. They handle OCR alongside chat, tool calling, and reasoning in a single model.
VLM OCR Intents
VLM OCR supports seven distinct task modes:
| Intent | What It Does |
|---|---|
| PlainText | Unformatted text extraction |
| Markdown | Structured output with headings, lists, emphasis |
| TableRecognition | Preserves table rows and columns |
| FormulaRecognition | Mathematical formulas and equations |
| ChartRecognition | Interprets charts and graphs |
| OcrWithCoordinates | Text plus pixel-level bounding boxes |
| SealRecognition | Stamps, seals, and official marks |
using LMKit.Model;
using LMKit.Extraction.Ocr;
using LM ocrModel = LM.LoadFromModelID("paddleocr-vl:0.9b");
// Extract tables from a scanned document
var ocr = new VlmOcr(ocrModel, VlmOcrIntent.TableRecognition);
var result = ocr.Process(new Attachment("scanned-invoice.pdf"));
Console.WriteLine(result.PageElement.Text);
LM-Kit OCR
LM-Kit OCR is engineered for high throughput, very high accuracy on business documents, and complex page layout handling. It excels at invoices, contracts, reports, forms, and multi-column layouts:
using LMKit.Extraction.Ocr;
var ocr = new LMKitOcr();
var result = ocr.Process(new Attachment("document.png"));
Console.WriteLine(result.PageText);
// Access word-level bounding boxes
foreach (var element in result.TextElements)
{
Console.WriteLine($"'{element.Text}' at ({element.Left}, {element.Top}) size {element.Width}x{element.Height}");
}
LM-Kit OCR includes advanced processing capabilities:
- Complex page layout analysis with intelligent reading order reconstruction for multi-column documents
- Very high accuracy on business documents: invoices, contracts, reports, and forms
- High-throughput processing optimized for large-scale batch workflows
- Orientation detection and rotation (0, 90, 180, 270 degrees)
- Deskewing of scanned documents
- Language detection (when a vision-capable model is available)
- On-demand download of language-specific dictionaries from Hugging Face
Supported Input Formats
Both OCR engines process documents through the Attachment class:
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, XLSX, PPTX, EML, MBOX |
| Images | PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP, PSD, PNM, HDR, TGA |
Multi-page documents are processed page by page.
When to Use Each
| Scenario | Recommended |
|---|---|
| Complex layouts with tables and columns | VLM OCR |
| Mathematical formulas | VLM OCR (FormulaRecognition) |
| Charts and graphs | VLM OCR (ChartRecognition) |
| Need text position coordinates | VLM OCR (OcrWithCoordinates) or LM-Kit OCR |
| Business documents (invoices, contracts, forms) | LM-Kit OCR |
| High-volume batch processing | LM-Kit OCR |
| Complex multi-column page layouts | LM-Kit OCR or VLM OCR |
| Minimal memory footprint | LM-Kit OCR |
| Multilingual documents | VLM OCR |
| Integration with LLM extraction pipeline | Either (both plug into TextExtraction) |
📚 Related Content
- Can LM-Kit.NET process images, PDFs, and audio in one application?: Full multimodal capabilities overview.
- Convert Documents to Markdown with VLM OCR: Step-by-step VLM OCR guide.
- Extract Tables from Documents with VLM OCR: Table extraction walkthrough.
- Process Scanned Documents with OCR and Vision Models: End-to-end scanned document processing.