Extract Text from Images and Documents with VLM OCR

Invoices, receipts, ID cards, contracts, and scanned letters sit in every enterprise pipeline as images or PDFs that downstream systems cannot read. LM-Kit.NET's VlmOcr engine, paired with the PaddleOCR VL 1.6 model, converts these into clean plain text on-device in a single API call. PaddleOCR VL 1.6 is a purpose-built 0.9B vision-language model that achieves 94.5% accuracy on OmniDocBench v1.5 while requiring only ~1 GB of VRAM. This tutorial walks through extracting text from single images, multi-page PDFs, and batch folders using the VlmOcrIntent.PlainText intent.

Why PaddleOCR VL for Document Text Extraction

Two practical advantages over traditional OCR engines:

Robustness on real-world inputs. PaddleOCR VL handles skewed scans, phone-captured photos, low-resolution faxes, and mixed-language documents without any preprocessing. It was trained and benchmarked across five challenging scenarios: scanning, skew, warping, screen photography, and uneven illumination.
Ultra-compact footprint. At 0.9B parameters and ~750 MB on disk, PaddleOCR VL runs on laptops, edge devices, and CI runners without a dedicated GPU. This makes it practical for always-on ingestion pipelines and on-device privacy-first workloads.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	~1 GB (PaddleOCR VL 1.6)
Disk	~750 MB free for model download

Input formats: scanned PDF, PNG, JPEG, TIFF, BMP, WebP, DOCX, XLSX, PPTX, EML.

Step 1: Create the Project

dotnet new console -n OcrTextExtraction
cd OcrTextExtraction
dotnet add package LM-Kit.NET

Step 2: Extract Text from a Single Image

Load the PaddleOCR VL model and extract text from an image file using the OCR: instruction:

using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl-1.6:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Extract text using PlainText intent
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText);

var attachment = new Attachment("scanned_receipt.png");

VlmOcr.VlmOcrResult result = ocr.Run(attachment);

string extractedText = result.PageElement.Text;
Console.WriteLine(extractedText);

// Optional: save to file
File.WriteAllText("receipt_text.txt", extractedText);
Console.WriteLine("\nSaved to receipt_text.txt");

The PlainText intent tells the engine to extract unformatted text. The engine maps this to the best available instruction for the loaded model (for example, "OCR:" for PaddleOCR VL). This works on any document type: invoices, letters, forms, ID cards, labels, and more.

Step 3: Extract Text from a Multi-Page PDF

Process each page of a scanned PDF and concatenate the results:

using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl-1.6:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Multi-page PDF extraction
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText)
{
    MaximumCompletionTokens = 4096
};

string pdfPath = "scanned_contract.pdf";
var attachment = new Attachment(pdfPath);

int pageCount = attachment.PageCount;
Console.WriteLine($"Processing {pageCount} pages from {Path.GetFileName(pdfPath)}...\n");

var fullText = new StringBuilder();

for (int page = 0; page < pageCount; page++)
{
    Console.Write($"  Page {page + 1}/{pageCount}... ");

    VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: page);
    string pageText = pageResult.PageElement.Text;

    fullText.AppendLine($"--- Page {page + 1} ---");
    fullText.AppendLine(pageText);
    fullText.AppendLine();

    Console.WriteLine($"{pageResult.TextGeneration.GeneratedTokenCount} tokens generated");
}

string outputPath = Path.ChangeExtension(pdfPath, ".txt");
File.WriteAllText(outputPath, fullText.ToString());
Console.WriteLine($"\nSaved {pageCount} pages to {outputPath}");

Step 4: Batch Process a Folder of Documents

Convert an entire folder of scanned images and PDFs into text files:

using System.Diagnostics;
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl-1.6:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Batch process all documents
// ──────────────────────────────────────
string inputDir = "inbox";
string outputDir = "extracted_text";
Directory.CreateDirectory(outputDir);

string[] supportedExtensions = { ".png", ".jpg", ".jpeg", ".tiff", ".bmp", ".webp", ".pdf" };

string[] files = Directory.GetFiles(inputDir)
    .Where(f => supportedExtensions.Contains(Path.GetExtension(f).ToLowerInvariant()))
    .ToArray();

Console.WriteLine($"Processing {files.Length} files...\n");

var ocr = new VlmOcr(model, VlmOcrIntent.PlainText)
{
    MaximumCompletionTokens = 4096
};

var stopwatch = Stopwatch.StartNew();

foreach (string file in files)
{
    string fileName = Path.GetFileName(file);
    Console.Write($"  {fileName}... ");

    var attachment = new Attachment(file);
    var text = new StringBuilder();

    for (int p = 0; p < attachment.PageCount; p++)
    {
        VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: p);
        text.AppendLine(pageResult.PageElement.Text);
    }

    string outPath = Path.Combine(outputDir, Path.ChangeExtension(fileName, ".txt"));
    File.WriteAllText(outPath, text.ToString());
    Console.WriteLine($"{attachment.PageCount} page(s) done");
}

stopwatch.Stop();
Console.WriteLine($"\nProcessed {files.Length} files in {stopwatch.Elapsed.TotalSeconds:F1}s");
Console.WriteLine($"Output saved to {outputDir}/");

Step 5: Performance Metrics

Track throughput and quality for production monitoring:

using System.Diagnostics;
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl-1.6:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Run OCR and collect metrics
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText);

var attachment = new Attachment("document.png");

var stopwatch = Stopwatch.StartNew();
VlmOcr.VlmOcrResult result = ocr.Run(attachment);
stopwatch.Stop();

Console.WriteLine($"Tokens generated : {result.TextGeneration.GeneratedTokenCount}");
Console.WriteLine($"Time elapsed     : {stopwatch.Elapsed.TotalSeconds:F1}s");
Console.WriteLine($"Speed            : {result.TextGeneration.TokenGenerationRate:F1} tokens/s");
Console.WriteLine($"Quality score    : {result.TextGeneration.QualityScore:F2}");
Console.WriteLine($"Context usage    : {result.TextGeneration.ContextTokens.Count}/{result.TextGeneration.ContextSize}");
Console.WriteLine($"Stop reason      : {result.TextGeneration.TerminationReason}");

Industry Use Cases

Industry	Document Type	What You Extract
Finance	Invoices, receipts, bank statements	Line items, totals, dates, account numbers
Healthcare	Prescriptions, lab reports, referral letters	Patient info, medication names, test results
Legal	Contracts, court filings, notarized documents	Clauses, parties, dates, signatures
Logistics	Shipping labels, packing slips, customs forms	Tracking numbers, addresses, weight, item counts
Insurance	Claim forms, policy documents, accident reports	Claim numbers, coverage details, descriptions
Government	ID cards, permits, tax forms	Names, ID numbers, addresses, filing data

Common Issues

Problem	Cause	Fix
Output truncated mid-sentence	`MaximumCompletionTokens` too low	Increase to 4096 or higher
Blank or garbled output	Image too small or extremely low contrast	Resize or enhance image before processing
Mixed-language text partially recognized	Model defaults to dominant language	PaddleOCR VL handles 32 languages natively; ensure input resolution is adequate
Slow on CPU-only machines	Model runs in FP32 on CPU	Use Q4_K_M quantization (default) and ensure AVX2 support

Next Steps

Extract Tables from Documents with VLM OCR: use VlmOcrIntent.TableRecognition for structured table output.
Recognize Mathematical Formulas with VLM OCR: extract LaTeX from equations, homework, and textbooks.
Extract Data from Charts and Graphs with VLM OCR: pull data from bar charts, pie charts, and line graphs.
Convert Documents to Markdown with VLM OCR: produce Markdown instead of plain text using larger VLMs.
Samples: VLM OCR Demo: interactive console demo with all OCR intents.

Table of Contents