Table of Contents

Extract Text from Images and Documents with VLM OCR

Invoices, receipts, ID cards, contracts, and scanned letters sit in every enterprise pipeline as images or PDFs that downstream systems cannot read. LM-Kit.NET's VlmOcr engine, paired with the PaddleOCR VL 1.5 model, converts these into clean plain text on-device in a single API call. PaddleOCR VL 1.5 is a purpose-built 0.9B vision-language model that achieves 94.5% accuracy on OmniDocBench v1.5 while requiring only ~1 GB of VRAM. This tutorial walks through extracting text from single images, multi-page PDFs, and batch folders using the VlmOcrIntent.PlainText intent.


Why PaddleOCR VL for Document Text Extraction

Two practical advantages over traditional OCR engines:

  1. Robustness on real-world inputs. PaddleOCR VL handles skewed scans, phone-captured photos, low-resolution faxes, and mixed-language documents without any preprocessing. It was trained and benchmarked across five challenging scenarios: scanning, skew, warping, screen photography, and uneven illumination.
  2. Ultra-compact footprint. At 0.9B parameters and ~750 MB on disk, PaddleOCR VL runs on laptops, edge devices, and CI runners without a dedicated GPU. This makes it practical for always-on ingestion pipelines and on-device privacy-first workloads.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM ~1 GB (PaddleOCR VL 1.5)
Disk ~750 MB free for model download

Input formats: scanned PDF, PNG, JPEG, TIFF, BMP, WebP, DOCX, XLSX, PPTX, EML.


Step 1: Create the Project

dotnet new console -n OcrTextExtraction
cd OcrTextExtraction
dotnet add package LM-Kit.NET

Step 2: Extract Text from a Single Image

Load the PaddleOCR VL model and extract text from an image file using the OCR: instruction:

using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Extract text using PlainText intent
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText);

var attachment = new Attachment("scanned_receipt.png");

VlmOcr.VlmOcrResult result = ocr.Run(attachment);

string extractedText = result.PageElement.Text;
Console.WriteLine(extractedText);

// Optional: save to file
File.WriteAllText("receipt_text.txt", extractedText);
Console.WriteLine("\nSaved to receipt_text.txt");

The PlainText intent tells the engine to extract unformatted text. The engine maps this to the best available instruction for the loaded model (for example, "OCR:" for PaddleOCR VL). This works on any document type: invoices, letters, forms, ID cards, labels, and more.


Step 3: Extract Text from a Multi-Page PDF

Process each page of a scanned PDF and concatenate the results:

using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Multi-page PDF extraction
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText)
{
    MaximumCompletionTokens = 4096
};

string pdfPath = "scanned_contract.pdf";
var attachment = new Attachment(pdfPath);

int pageCount = attachment.PageCount;
Console.WriteLine($"Processing {pageCount} pages from {Path.GetFileName(pdfPath)}...\n");

var fullText = new StringBuilder();

for (int page = 0; page < pageCount; page++)
{
    Console.Write($"  Page {page + 1}/{pageCount}... ");

    VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: page);
    string pageText = pageResult.PageElement.Text;

    fullText.AppendLine($"--- Page {page + 1} ---");
    fullText.AppendLine(pageText);
    fullText.AppendLine();

    Console.WriteLine($"{pageResult.TextGeneration.GeneratedTokenCount} tokens generated");
}

string outputPath = Path.ChangeExtension(pdfPath, ".txt");
File.WriteAllText(outputPath, fullText.ToString());
Console.WriteLine($"\nSaved {pageCount} pages to {outputPath}");

Step 4: Batch Process a Folder of Documents

Convert an entire folder of scanned images and PDFs into text files:

using System.Diagnostics;
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Batch process all documents
// ──────────────────────────────────────
string inputDir = "inbox";
string outputDir = "extracted_text";
Directory.CreateDirectory(outputDir);

string[] supportedExtensions = { ".png", ".jpg", ".jpeg", ".tiff", ".bmp", ".webp", ".pdf" };

string[] files = Directory.GetFiles(inputDir)
    .Where(f => supportedExtensions.Contains(Path.GetExtension(f).ToLowerInvariant()))
    .ToArray();

Console.WriteLine($"Processing {files.Length} files...\n");

var ocr = new VlmOcr(model, VlmOcrIntent.PlainText)
{
    MaximumCompletionTokens = 4096
};

var stopwatch = Stopwatch.StartNew();

foreach (string file in files)
{
    string fileName = Path.GetFileName(file);
    Console.Write($"  {fileName}... ");

    var attachment = new Attachment(file);
    var text = new StringBuilder();

    for (int p = 0; p < attachment.PageCount; p++)
    {
        VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: p);
        text.AppendLine(pageResult.PageElement.Text);
    }

    string outPath = Path.Combine(outputDir, Path.ChangeExtension(fileName, ".txt"));
    File.WriteAllText(outPath, text.ToString());
    Console.WriteLine($"{attachment.PageCount} page(s) done");
}

stopwatch.Stop();
Console.WriteLine($"\nProcessed {files.Length} files in {stopwatch.Elapsed.TotalSeconds:F1}s");
Console.WriteLine($"Output saved to {outputDir}/");

Step 5: Performance Metrics

Track throughput and quality for production monitoring:

using System.Diagnostics;
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Run OCR and collect metrics
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText);

var attachment = new Attachment("document.png");

var stopwatch = Stopwatch.StartNew();
VlmOcr.VlmOcrResult result = ocr.Run(attachment);
stopwatch.Stop();

Console.WriteLine($"Tokens generated : {result.TextGeneration.GeneratedTokenCount}");
Console.WriteLine($"Time elapsed     : {stopwatch.Elapsed.TotalSeconds:F1}s");
Console.WriteLine($"Speed            : {result.TextGeneration.TokenGenerationRate:F1} tokens/s");
Console.WriteLine($"Quality score    : {result.TextGeneration.QualityScore:F2}");
Console.WriteLine($"Context usage    : {result.TextGeneration.ContextTokens.Count}/{result.TextGeneration.ContextSize}");
Console.WriteLine($"Stop reason      : {result.TextGeneration.TerminationReason}");

Industry Use Cases

Industry Document Type What You Extract
Finance Invoices, receipts, bank statements Line items, totals, dates, account numbers
Healthcare Prescriptions, lab reports, referral letters Patient info, medication names, test results
Legal Contracts, court filings, notarized documents Clauses, parties, dates, signatures
Logistics Shipping labels, packing slips, customs forms Tracking numbers, addresses, weight, item counts
Insurance Claim forms, policy documents, accident reports Claim numbers, coverage details, descriptions
Government ID cards, permits, tax forms Names, ID numbers, addresses, filing data

Common Issues

Problem Cause Fix
Output truncated mid-sentence MaximumCompletionTokens too low Increase to 4096 or higher
Blank or garbled output Image too small or extremely low contrast Resize or enhance image before processing
Mixed-language text partially recognized Model defaults to dominant language PaddleOCR VL handles 32 languages natively; ensure input resolution is adequate
Slow on CPU-only machines Model runs in FP32 on CPU Use Q4_K_M quantization (default) and ensure AVX2 support

Next Steps