What OCR Options Does LM-Kit.NET Provide?

TL;DR

LM-Kit.NET offers two OCR approaches: VLM OCR uses vision language models for high-accuracy recognition with semantic understanding of complex layouts, tables, formulas, and charts. LM-Kit OCR is a high-throughput engine engineered for speed and very high accuracy on business documents, with advanced page layout handling. Choose VLM OCR for documents requiring semantic understanding (formulas, charts) and LM-Kit OCR for high-volume business document processing with maximum throughput.

VLM OCR vs LM-Kit OCR

Feature	VLM OCR	LM-Kit OCR
Engine	Vision language model (AI)	High-throughput OCR engine
Complex layouts	Excellent (multi-column, nested)	Advanced (multi-column, business documents)
Table extraction	Preserves rows and columns	Limited
Formula recognition	Mathematical formulas and equations	Not supported
Chart interpretation	Extracts data from visualizations	Not supported
Output formats	Plain text, Markdown, coordinates	Plain text with word-level bounding boxes
Speed	Slower (model inference)	Very fast, optimized for high throughput
Memory	Requires loading a VLM model	Lightweight
Preprocessing	Handled by the model	Automatic deskew, rotation detection
Languages	Multilingual via model capabilities	Language-specific dictionaries

VLM OCR Models

Model	Size	Strengths
`paddleocr-vl-1.6:0.9b`	Ultra-compact (0.9B params)	Six task modes including coordinates and seal recognition
`glm-ocr`	Ultra-compact (0.9B params)	Document parsing, text, formula, table, and complex layout recognition across multiple languages
`lightonocr1025:1b`	Compact (1B params)	Layout-aware text extraction, Markdown output
`lightonocr-2:1b`	Compact (1B params)	RLVR-refined for accuracy, tables, receipts, forms, math
`glm-4.6v-flash`	10B	Vision + OCR + chat + tool calling. Strong at documents, screenshots, charts, tables. 131K context

General-purpose vision models with explicit OCR capability include the Qwen 3.5 family (qwen3.5:4b, qwen3.5:9b, qwen3.5:27b, qwen3.5:35b-a3b), the latest Qwen 3.6 family (qwen3.6:27b, qwen3.6:35b-a3b), the Qwen 3 VL family (qwen3-vl:2b, qwen3-vl:4b, qwen3-vl:8b, qwen3-vl:30b), and minicpm-v-45. They handle OCR alongside chat, tool calling, and reasoning in a single model.

VLM OCR Intents

VLM OCR supports seven distinct task modes:

Intent	What It Does
PlainText	Unformatted text extraction
Markdown	Structured output with headings, lists, emphasis
TableRecognition	Preserves table rows and columns
FormulaRecognition	Mathematical formulas and equations
ChartRecognition	Interprets charts and graphs
OcrWithCoordinates	Text plus pixel-level bounding boxes
SealRecognition	Stamps, seals, and official marks

using LMKit.Model;
using LMKit.Extraction.Ocr;

using LM ocrModel = LM.LoadFromModelID("paddleocr-vl-1.6:0.9b");

// Extract tables from a scanned document
var ocr = new VlmOcr(ocrModel, VlmOcrIntent.TableRecognition);
var result = ocr.Process(new Attachment("scanned-invoice.pdf"));
Console.WriteLine(result.PageElement.Text);

LM-Kit OCR

LM-Kit OCR is engineered for high throughput, very high accuracy on business documents, and complex page layout handling. It excels at invoices, contracts, reports, forms, and multi-column layouts:

using LMKit.Extraction.Ocr;

var ocr = new LMKitOcr();
var result = ocr.Process(new Attachment("document.png"));
Console.WriteLine(result.PageText);

// Access word-level bounding boxes
foreach (var element in result.TextElements)
{
    Console.WriteLine($"'{element.Text}' at ({element.Left}, {element.Top}) size {element.Width}x{element.Height}");
}

LM-Kit OCR includes advanced processing capabilities:

Complex page layout analysis with intelligent reading order reconstruction for multi-column documents
Very high accuracy on business documents: invoices, contracts, reports, and forms
High-throughput processing optimized for large-scale batch workflows
Orientation detection and rotation (0, 90, 180, 270 degrees)
Deskewing of scanned documents
Language detection (when a vision-capable model is available)
On-demand download of language-specific dictionaries from Hugging Face

Supported Input Formats

Both OCR engines process documents through the Attachment class:

Category	Formats
Documents	PDF, DOCX, XLSX, PPTX, EML, MBOX
Images	PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP, PSD, PNM, HDR, TGA

Multi-page documents are processed page by page.

When to Use Each

Scenario	Recommended
Complex layouts with tables and columns	VLM OCR
Mathematical formulas	VLM OCR (FormulaRecognition)
Charts and graphs	VLM OCR (ChartRecognition)
Need text position coordinates	VLM OCR (OcrWithCoordinates) or LM-Kit OCR
Business documents (invoices, contracts, forms)	LM-Kit OCR
High-volume batch processing	LM-Kit OCR
Complex multi-column page layouts	LM-Kit OCR or VLM OCR
Minimal memory footprint	LM-Kit OCR
Multilingual documents	VLM OCR
Integration with LLM extraction pipeline	Either (both plug into `TextExtraction`)

Can LM-Kit.NET process images, PDFs, and audio in one application?: Full multimodal capabilities overview.
Convert Documents to Markdown with VLM OCR: Step-by-step VLM OCR guide.
Extract Tables from Documents with VLM OCR: Table extraction walkthrough.
Process Scanned Documents with OCR and Vision Models: End-to-end scanned document processing.

Table of Contents