Extract Tables from Documents with VLM OCR

Financial statements, medical lab reports, shipping manifests, and procurement orders all rely on tabular data. When these documents arrive as scanned PDFs or photographs, extracting the table structure is one of the hardest problems in document processing. LM-Kit.NET's VlmOcr engine, combined with PaddleOCR VL's dedicated Table Recognition: instruction, detects rows, columns, headers, and merged cells in a single inference pass without any layout heuristics or post-processing rules. This tutorial shows how to extract tables from images, PDFs, and mixed-content documents.

Why Dedicated Table Recognition Matters

Two practical advantages of PaddleOCR VL's table mode over generic OCR:

Structural fidelity. Generic OCR reads text left-to-right, top-to-bottom. It cannot distinguish between a heading, a body cell, and a footer. PaddleOCR VL's Table Recognition: mode preserves row and column boundaries, merged cells, and header rows, producing output that maps directly to structured data.
No template configuration. Traditional table extraction requires manually defining column positions, separator patterns, or anchor keywords for each document type. PaddleOCR VL generalizes across invoices, lab results, financial reports, and forms without any per-template setup.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	~1 GB (PaddleOCR VL 1.6)
Disk	~750 MB free for model download

Input formats: scanned PDF, PNG, JPEG, TIFF, BMP, WebP.

Step 1: Create the Project

dotnet new console -n TableExtraction
cd TableExtraction
dotnet add package LM-Kit.NET

Step 2: Extract a Table from an Image

Load the PaddleOCR VL model and use the Table Recognition: instruction to extract structured table data:

using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl-1.6:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Extract table using Table Recognition mode
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.TableRecognition);

var attachment = new Attachment("financial_statement.png");

VlmOcr.VlmOcrResult result = ocr.Run(attachment);

string tableOutput = result.PageElement.Text;
Console.WriteLine(tableOutput);

File.WriteAllText("extracted_table.txt", tableOutput);
Console.WriteLine("\nSaved to extracted_table.txt");

The Table Recognition: instruction activates PaddleOCR VL's specialized table detection pipeline. The model identifies table boundaries, column headers, row separators, and cell content, and returns the data in a structured format.

Step 3: Extract Tables from a Multi-Page PDF

Financial reports and procurement documents often span multiple pages. Process each page and collect all tables:

using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl-1.6:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Multi-page table extraction
// ──────────────────────────────────────
var ocr = new VlmOcr(model, VlmOcrIntent.TableRecognition)
{
    MaximumCompletionTokens = 4096
};

string pdfPath = "quarterly_report.pdf";
var attachment = new Attachment(pdfPath);

int pageCount = attachment.PageCount;
Console.WriteLine($"Scanning {pageCount} pages for tables...\n");

var allTables = new StringBuilder();

for (int page = 0; page < pageCount; page++)
{
    Console.Write($"  Page {page + 1}/{pageCount}... ");

    VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: page);
    string pageContent = pageResult.PageElement.Text;

    if (!string.IsNullOrWhiteSpace(pageContent))
    {
        allTables.AppendLine($"--- Table(s) from page {page + 1} ---");
        allTables.AppendLine(pageContent);
        allTables.AppendLine();
        Console.WriteLine("table(s) found");
    }
    else
    {
        Console.WriteLine("no table detected");
    }
}

File.WriteAllText("all_tables.txt", allTables.ToString());
Console.WriteLine($"\nExtracted tables saved to all_tables.txt");

Step 4: Combine General OCR with Table Extraction

Many documents contain both free-form text and tables. Run two passes: one for general text and one for tables:

using System.Text;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load PaddleOCR VL model
// ──────────────────────────────────────
Console.WriteLine("Loading PaddleOCR VL model...");
using LM model = LM.LoadFromModelID("paddleocr-vl-1.6:0.9b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Two-pass extraction: text + tables
// ──────────────────────────────────────
var attachment = new Attachment("invoice.png");

// Pass 1: General text extraction
var textOcr = new VlmOcr(model, VlmOcrIntent.PlainText);
VlmOcr.VlmOcrResult textResult = textOcr.Run(attachment);
Console.WriteLine("=== Full Document Text ===");
Console.WriteLine(textResult.PageElement.Text);

// Pass 2: Focused table extraction
var tableOcr = new VlmOcr(model, VlmOcrIntent.TableRecognition);
VlmOcr.VlmOcrResult tableResult = tableOcr.Run(attachment);
Console.WriteLine("\n=== Extracted Table(s) ===");
Console.WriteLine(tableResult.PageElement.Text);

This two-pass approach is useful for invoices where you need both the header information (vendor, date, invoice number) and the line-item table.

Industry Use Cases

Industry	Document Type	What You Extract
Finance	Income statements, balance sheets, trial balances	Account names, amounts, period columns
Healthcare	Lab reports, vital signs logs, medication schedules	Test names, reference ranges, values, dates
Procurement	Purchase orders, packing slips, price lists	Item codes, quantities, unit prices, totals
Insurance	Coverage comparison tables, benefit schedules	Plan names, limits, deductibles, copays
Logistics	Customs declarations, bill of lading tables	HS codes, weights, quantities, origins
Education	Grade sheets, timetables, exam results	Student names, subjects, scores, credits

Common Issues

Problem	Cause	Fix
Columns misaligned in output	Low-resolution scan or extreme skew	Improve scan quality; consider preprocessing with `ImageBuffer.Deskew()`
Merged cells not detected	Complex multi-level headers	Increase `MaximumCompletionTokens` to give the model room for verbose output
Table output mixed with body text	Document has text above and below the table	Use `Table Recognition:` mode, which focuses specifically on table regions
Partial table on page boundary	Table spans two PDF pages	Extract tables from consecutive pages and merge programmatically

Next Steps

Extract Text from Images and Documents with VLM OCR: general OCR for full-page text extraction.
Extract Invoice Data from PDFs and Images: combine table extraction with structured data extraction for invoices.
Extract Structured Data from Unstructured Text: parse the extracted table text into typed fields and objects.
Samples: VLM OCR Demo: interactive console demo with all OCR intents.

Table of Contents