Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document-intelligence/layout-understanding/page_layout_inspector

Page Layout Inspector for C# .NET Applications


🎯 Purpose of the Demo

An interactive console app that runs VLM-OCR on a page image and inspects the layout tree returned by the SDK. See the count of text elements, lines, and paragraphs, preview the first N lines with bounding-box coordinates, and optionally export every line as CSV for downstream tooling.

All processing runs on-device.


👥 Industry Target Audience

  • Document AI teams designing column-aware search, region-based extraction, or layout-preserving exports.
  • Reviewer UIs that need to draw rectangles around text on a rendered page.
  • Forms / tables: starting point for table reconstruction.
  • Redaction: position-accurate redaction needs the layout primitives.
  • Quality assurance: validate that an OCR model is segmenting pages correctly before downstream consumption.

🚀 Problem Solved

Most OCR tools return a flat blob of text and force you to reverse-engineer column / paragraph structure yourself. VlmOcr.Run(...) produces a PageElement tree with helpers (DetectLines, DetectParagraphs) and bounding boxes already attached. The demo exposes that surface area behind a menu so you can spot-check the layout before wiring it into a pipeline.


💻 Application Overview

Interactive menu (no command-line arguments) with two modes:

Mode What it does
Image Inspect one page image. Prompts for preview line count and an optional CSV path.
Folder Inspect every supported image in a folder; write one <basename>.lines.csv per file to a chosen output directory.
Quit Exit.

Each inspection prints text-element, line, and paragraph counts. The CSV columns are index,top,left,bottom,right,text.

✨ Key Features

  • VlmOcr.Run(ImageBuffer) returning a VlmOcrResult.
  • PageElement.DetectLines() / DetectParagraphs() as layout primitives.
  • LineElement.Bounds / Text: ready-to-use coordinates and content.
  • CSV export: portable input for review UIs, table reconstruction, redaction.

🧠 Model

  • paddleocr-vl-1.6:0.9b (fast, low-VRAM VLM-OCR).

🛠️ Getting Started

📋 Prerequisites

  • .NET 8.0 or later

▶️ Running the Application

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document-intelligence/layout-understanding/page_layout_inspector
dotnet run

Pick a mode from the menu and follow the prompts.

Share