Table of Contents

What OCR Options Does LM-Kit.NET Provide?


TL;DR

LM-Kit.NET offers two OCR approaches: VLM OCR uses vision language models for high-accuracy recognition with semantic understanding of complex layouts, tables, formulas, and charts. LM-Kit OCR is a high-throughput engine engineered for speed and very high accuracy on business documents, with advanced page layout handling. Choose VLM OCR for documents requiring semantic understanding (formulas, charts) and LM-Kit OCR for high-volume business document processing with maximum throughput.


VLM OCR vs LM-Kit OCR

Feature VLM OCR LM-Kit OCR
Engine Vision language model (AI) High-throughput OCR engine
Complex layouts Excellent (multi-column, nested) Advanced (multi-column, business documents)
Table extraction Preserves rows and columns Limited
Formula recognition Mathematical formulas and equations Not supported
Chart interpretation Extracts data from visualizations Not supported
Output formats Plain text, Markdown, coordinates Plain text with word-level bounding boxes
Speed Slower (model inference) Very fast, optimized for high throughput
Memory Requires loading a VLM model Lightweight
Preprocessing Handled by the model Automatic deskew, rotation detection
Languages Multilingual via model capabilities Language-specific dictionaries

VLM OCR Models

Model Size Strengths
paddleocr-vl:0.9b Ultra-compact (0.9B params) Six task modes including coordinates and seal recognition
glm-ocr Ultra-compact (0.9B params) Document parsing, text, formula, table, and complex layout recognition across multiple languages
lightonocr1025:1b Compact (1B params) Layout-aware text extraction, Markdown output
lightonocr-2:1b Compact (1B params) RLVR-refined for accuracy, tables, receipts, forms, math
glm-4.6v-flash 10B Vision + OCR + chat + tool calling. Strong at documents, screenshots, charts, tables. 131K context

General-purpose vision models with explicit OCR capability include the Qwen 3.5 family (qwen3.5:4b, qwen3.5:9b, qwen3.5:27b, qwen3.5:35b-a3b), the latest Qwen 3.6 family (qwen3.6:27b, qwen3.6:35b-a3b), the Qwen 3 VL family (qwen3-vl:2b, qwen3-vl:4b, qwen3-vl:8b, qwen3-vl:30b), and minicpm-v-45. They handle OCR alongside chat, tool calling, and reasoning in a single model.


VLM OCR Intents

VLM OCR supports seven distinct task modes:

Intent What It Does
PlainText Unformatted text extraction
Markdown Structured output with headings, lists, emphasis
TableRecognition Preserves table rows and columns
FormulaRecognition Mathematical formulas and equations
ChartRecognition Interprets charts and graphs
OcrWithCoordinates Text plus pixel-level bounding boxes
SealRecognition Stamps, seals, and official marks
using LMKit.Model;
using LMKit.Extraction.Ocr;

using LM ocrModel = LM.LoadFromModelID("paddleocr-vl:0.9b");

// Extract tables from a scanned document
var ocr = new VlmOcr(ocrModel, VlmOcrIntent.TableRecognition);
var result = ocr.Process(new Attachment("scanned-invoice.pdf"));
Console.WriteLine(result.PageElement.Text);

LM-Kit OCR

LM-Kit OCR is engineered for high throughput, very high accuracy on business documents, and complex page layout handling. It excels at invoices, contracts, reports, forms, and multi-column layouts:

using LMKit.Extraction.Ocr;

var ocr = new LMKitOcr();
var result = ocr.Process(new Attachment("document.png"));
Console.WriteLine(result.PageText);

// Access word-level bounding boxes
foreach (var element in result.TextElements)
{
    Console.WriteLine($"'{element.Text}' at ({element.Left}, {element.Top}) size {element.Width}x{element.Height}");
}

LM-Kit OCR includes advanced processing capabilities:

  • Complex page layout analysis with intelligent reading order reconstruction for multi-column documents
  • Very high accuracy on business documents: invoices, contracts, reports, and forms
  • High-throughput processing optimized for large-scale batch workflows
  • Orientation detection and rotation (0, 90, 180, 270 degrees)
  • Deskewing of scanned documents
  • Language detection (when a vision-capable model is available)
  • On-demand download of language-specific dictionaries from Hugging Face

Supported Input Formats

Both OCR engines process documents through the Attachment class:

Category Formats
Documents PDF, DOCX, XLSX, PPTX, EML, MBOX
Images PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP, PSD, PNM, HDR, TGA

Multi-page documents are processed page by page.


When to Use Each

Scenario Recommended
Complex layouts with tables and columns VLM OCR
Mathematical formulas VLM OCR (FormulaRecognition)
Charts and graphs VLM OCR (ChartRecognition)
Need text position coordinates VLM OCR (OcrWithCoordinates) or LM-Kit OCR
Business documents (invoices, contracts, forms) LM-Kit OCR
High-volume batch processing LM-Kit OCR
Complex multi-column page layouts LM-Kit OCR or VLM OCR
Minimal memory footprint LM-Kit OCR
Multilingual documents VLM OCR
Integration with LLM extraction pipeline Either (both plug into TextExtraction)

Share