Extract Tables from Documents with VLM OCR
Financial statements, medical lab reports, shipping manifests, and procurement orders all rely on tabular data. When these documents arrive as scanned PDFs or photographs, extracting the table structure is one of the hardest problems in document processing. LM-Kit.NET's VlmOcr engine, combined with PaddleOCR VL's dedicated Table Recognition: instruction, detects rows, columns, headers, and merged cells in a single inference pass without any layout heuristics or post-processing rules. This tutorial shows how to extract tables from images, PDFs, and mixed-content documents.
Why Dedicated Table Recognition Matters
Two practical advantages of PaddleOCR VL's table mode over generic OCR:
- Structural fidelity. Generic OCR reads text left-to-right, top-to-bottom. It cannot distinguish between a heading, a body cell, and a footer. PaddleOCR VL's
Table Recognition:mode preserves row and column boundaries, merged cells, and header rows, producing output that maps directly to structured data. - No template configuration. Traditional table extraction requires manually defining column positions, separator patterns, or anchor keywords for each document type. PaddleOCR VL generalizes across invoices, lab results, financial reports, and forms without any per-template setup.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | ~1 GB (PaddleOCR VL 1.5) |
| Disk | ~750 MB free for model download |
Input formats: scanned PDF, PNG, JPEG, TIFF, BMP, WebP.
Step 1: Create the Project
dotnet new console -n TableExtraction
cd TableExtraction
dotnet add package LM-Kit.NET
Step 2: Extract a Table from an Image
Load the PaddleOCR VL model and use the Table Recognition: instruction to extract structured table data:
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Extract table using Table Recognition mode
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.TableRecognition);
var attachment = new Attachment("financial_statement.png");
VlmOcr.VlmOcrResult result = ocr.Run(attachment);
string tableOutput = result.PageElement.Text;
Console.WriteLine(tableOutput);
File.WriteAllText("extracted_table.txt", tableOutput);
Console.WriteLine("\nSaved to extracted_table.txt");
The Table Recognition: instruction activates PaddleOCR VL's specialized table detection pipeline. The model identifies table boundaries, column headers, row separators, and cell content, and returns the data in a structured format.
Step 3: Extract Tables from a Multi-Page PDF
Financial reports and procurement documents often span multiple pages. Process each page and collect all tables:
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Multi-page table extraction
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.TableRecognition)
{
MaximumCompletionTokens = 4096
};
string pdfPath = "quarterly_report.pdf";
var attachment = new Attachment(pdfPath);
int pageCount = attachment.PageCount;
Console.WriteLine($"Scanning {pageCount} pages for tables...\n");
var allTables = new StringBuilder();
for (int page = 0; page < pageCount; page++)
{
Console.Write($" Page {page + 1}/{pageCount}... ");
VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: page);
string pageContent = pageResult.PageElement.Text;
if (!string.IsNullOrWhiteSpace(pageContent))
{
allTables.AppendLine($"--- Table(s) from page {page + 1} ---");
allTables.AppendLine(pageContent);
allTables.AppendLine();
Console.WriteLine("table(s) found");
}
else
{
Console.WriteLine("no table detected");
}
}
File.WriteAllText("all_tables.txt", allTables.ToString());
Console.WriteLine($"\nExtracted tables saved to all_tables.txt");
Step 4: Combine General OCR with Table Extraction
Many documents contain both free-form text and tables. Run two passes: one for general text and one for tables:
using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl:0.9b",
downloadingProgress: (_, len, read) =>
{
if (len.HasValue) Console.Write($"\r Downloading: {(double)read / len.Value * 100:F1}% ");
return true;
},
loadingProgress: p => { Console.Write($"\r Loading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Two-pass extraction: text + tables
// ──────────────────────────────────────
var attachment = new Attachment("invoice.png");
// Pass 1: General text extraction
var textOcr = new VlmOcr(model, VlmOcrIntent.PlainText);
VlmOcr.VlmOcrResult textResult = textOcr.Run(attachment);
Console.WriteLine("=== Full Document Text ===");
Console.WriteLine(textResult.PageElement.Text);
// Pass 2: Focused table extraction
var tableOcr = new VlmOcr(model, VlmOcrIntent.TableRecognition);
VlmOcr.VlmOcrResult tableResult = tableOcr.Run(attachment);
Console.WriteLine("\n=== Extracted Table(s) ===");
Console.WriteLine(tableResult.PageElement.Text);
This two-pass approach is useful for invoices where you need both the header information (vendor, date, invoice number) and the line-item table.
Industry Use Cases
| Industry | Document Type | What You Extract |
|---|---|---|
| Finance | Income statements, balance sheets, trial balances | Account names, amounts, period columns |
| Healthcare | Lab reports, vital signs logs, medication schedules | Test names, reference ranges, values, dates |
| Procurement | Purchase orders, packing slips, price lists | Item codes, quantities, unit prices, totals |
| Insurance | Coverage comparison tables, benefit schedules | Plan names, limits, deductibles, copays |
| Logistics | Customs declarations, bill of lading tables | HS codes, weights, quantities, origins |
| Education | Grade sheets, timetables, exam results | Student names, subjects, scores, credits |
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Columns misaligned in output | Low-resolution scan or extreme skew | Improve scan quality; consider preprocessing with ImageBuffer.Deskew() |
| Merged cells not detected | Complex multi-level headers | Increase MaximumCompletionTokens to give the model room for verbose output |
| Table output mixed with body text | Document has text above and below the table | Use Table Recognition: mode, which focuses specifically on table regions |
| Partial table on page boundary | Table spans two PDF pages | Extract tables from consecutive pages and merge programmatically |
Next Steps
- Extract Text from Images and Documents with VLM OCR: general OCR for full-page text extraction.
- Extract Invoice Data from PDFs and Images: combine table extraction with structured data extraction for invoices.
- Extract Structured Data from Unstructured Text: parse the extracted table text into typed fields and objects.
- Samples: VLM OCR Demo: interactive console demo with all OCR intents.