Extract Text from Images and Documents with VLM OCR
Invoices, receipts, ID cards, contracts, and scanned letters sit in every enterprise pipeline as images or PDFs that downstream systems cannot read. LM-Kit.NET's VlmOcr engine, paired with the PaddleOCR VL 1.5 model, converts these into clean plain text on-device in a single API call. PaddleOCR VL 1.5 is a purpose-built 0.9B vision-language model that achieves 94.5% accuracy on OmniDocBench v1.5 while requiring only ~1 GB of VRAM. This tutorial walks through extracting text from single images, multi-page PDFs, and batch folders using the VlmOcrIntent.PlainText intent.
Why PaddleOCR VL for Document Text Extraction
Two practical advantages over traditional OCR engines:
- Robustness on real-world inputs. PaddleOCR VL handles skewed scans, phone-captured photos, low-resolution faxes, and mixed-language documents without any preprocessing. It was trained and benchmarked across five challenging scenarios: scanning, skew, warping, screen photography, and uneven illumination.
- Ultra-compact footprint. At 0.9B parameters and ~750 MB on disk, PaddleOCR VL runs on laptops, edge devices, and CI runners without a dedicated GPU. This makes it practical for always-on ingestion pipelines and on-device privacy-first workloads.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | ~1 GB (PaddleOCR VL 1.5) |
| Disk | ~750 MB free for model download |
Input formats: scanned PDF, PNG, JPEG, TIFF, BMP, WebP, DOCX, XLSX, PPTX, EML.
Step 1: Create the Project
dotnet new console -n OcrTextExtraction
cd OcrTextExtraction
dotnet add package LM-Kit.NET
Step 2: Extract Text from a Single Image
Load the PaddleOCR VL model and extract text from an image file using the OCR: instruction:
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Extract text using PlainText intent
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText);
var attachment = new Attachment("scanned_receipt.png");
VlmOcr.VlmOcrResult result = ocr.Run(attachment);
string extractedText = result.PageElement.Text;
Console.WriteLine(extractedText);
// Optional: save to file
File.WriteAllText("receipt_text.txt", extractedText);
Console.WriteLine("\nSaved to receipt_text.txt");
The PlainText intent tells the engine to extract unformatted text. The engine maps this to the best available instruction for the loaded model (for example, "OCR:" for PaddleOCR VL). This works on any document type: invoices, letters, forms, ID cards, labels, and more.
Step 3: Extract Text from a Multi-Page PDF
Process each page of a scanned PDF and concatenate the results:
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Multi-page PDF extraction
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText)
{
MaximumCompletionTokens = 4096
};
string pdfPath = "scanned_contract.pdf";
var attachment = new Attachment(pdfPath);
int pageCount = attachment.PageCount;
Console.WriteLine($"Processing {pageCount} pages from {Path.GetFileName(pdfPath)}...\n");
var fullText = new StringBuilder();
for (int page = 0; page < pageCount; page++)
{
Console.Write($" Page {page + 1}/{pageCount}... ");
VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: page);
string pageText = pageResult.PageElement.Text;
fullText.AppendLine($"--- Page {page + 1} ---");
fullText.AppendLine(pageText);
fullText.AppendLine();
Console.WriteLine($"{pageResult.TextGeneration.GeneratedTokenCount} tokens generated");
}
string outputPath = Path.ChangeExtension(pdfPath, ".txt");
File.WriteAllText(outputPath, fullText.ToString());
Console.WriteLine($"\nSaved {pageCount} pages to {outputPath}");
Step 4: Batch Process a Folder of Documents
Convert an entire folder of scanned images and PDFs into text files:
using System.Diagnostics;
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Batch process all documents
// ──────────────────────────────────────
string inputDir = "inbox";
string outputDir = "extracted_text";
Directory.CreateDirectory(outputDir);
string[] supportedExtensions = { ".png", ".jpg", ".jpeg", ".tiff", ".bmp", ".webp", ".pdf" };
string[] files = Directory.GetFiles(inputDir)
.Where(f => supportedExtensions.Contains(Path.GetExtension(f).ToLowerInvariant()))
.ToArray();
Console.WriteLine($"Processing {files.Length} files...\n");
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText)
{
MaximumCompletionTokens = 4096
};
var stopwatch = Stopwatch.StartNew();
foreach (string file in files)
{
string fileName = Path.GetFileName(file);
Console.Write($" {fileName}... ");
var attachment = new Attachment(file);
var text = new StringBuilder();
for (int p = 0; p < attachment.PageCount; p++)
{
VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: p);
text.AppendLine(pageResult.PageElement.Text);
}
string outPath = Path.Combine(outputDir, Path.ChangeExtension(fileName, ".txt"));
File.WriteAllText(outPath, text.ToString());
Console.WriteLine($"{attachment.PageCount} page(s) done");
}
stopwatch.Stop();
Console.WriteLine($"\nProcessed {files.Length} files in {stopwatch.Elapsed.TotalSeconds:F1}s");
Console.WriteLine($"Output saved to {outputDir}/");
Step 5: Performance Metrics
Track throughput and quality for production monitoring:
using System.Diagnostics;
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Run OCR and collect metrics
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.PlainText);
var attachment = new Attachment("document.png");
var stopwatch = Stopwatch.StartNew();
VlmOcr.VlmOcrResult result = ocr.Run(attachment);
stopwatch.Stop();
Console.WriteLine($"Tokens generated : {result.TextGeneration.GeneratedTokenCount}");
Console.WriteLine($"Time elapsed : {stopwatch.Elapsed.TotalSeconds:F1}s");
Console.WriteLine($"Speed : {result.TextGeneration.TokenGenerationRate:F1} tokens/s");
Console.WriteLine($"Quality score : {result.TextGeneration.QualityScore:F2}");
Console.WriteLine($"Context usage : {result.TextGeneration.ContextTokens.Count}/{result.TextGeneration.ContextSize}");
Console.WriteLine($"Stop reason : {result.TextGeneration.TerminationReason}");
Industry Use Cases
| Industry | Document Type | What You Extract |
|---|---|---|
| Finance | Invoices, receipts, bank statements | Line items, totals, dates, account numbers |
| Healthcare | Prescriptions, lab reports, referral letters | Patient info, medication names, test results |
| Legal | Contracts, court filings, notarized documents | Clauses, parties, dates, signatures |
| Logistics | Shipping labels, packing slips, customs forms | Tracking numbers, addresses, weight, item counts |
| Insurance | Claim forms, policy documents, accident reports | Claim numbers, coverage details, descriptions |
| Government | ID cards, permits, tax forms | Names, ID numbers, addresses, filing data |
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Output truncated mid-sentence | MaximumCompletionTokens too low |
Increase to 4096 or higher |
| Blank or garbled output | Image too small or extremely low contrast | Resize or enhance image before processing |
| Mixed-language text partially recognized | Model defaults to dominant language | PaddleOCR VL handles 32 languages natively; ensure input resolution is adequate |
| Slow on CPU-only machines | Model runs in FP32 on CPU | Use Q4_K_M quantization (default) and ensure AVX2 support |
Next Steps
- Extract Tables from Documents with VLM OCR: use
VlmOcrIntent.TableRecognitionfor structured table output. - Recognize Mathematical Formulas with VLM OCR: extract LaTeX from equations, homework, and textbooks.
- Extract Data from Charts and Graphs with VLM OCR: pull data from bar charts, pie charts, and line graphs.
- Convert Documents to Markdown with VLM OCR: produce Markdown instead of plain text using larger VLMs.
- Samples: VLM OCR Demo: interactive console demo with all OCR intents.