👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document_to_markdown

Document-to-Markdown Vision OCR in .NET Applications

🎯 Purpose of the Demo

Document-to-Markdown Vision OCR demonstrates how to use LM-Kit.NET with vision-capable models to run on-device OCR on images and PDF documents (scans, screenshots, receipts, reports, etc.) and convert them into clean text (or Markdown-style text) in a loop.

The sample shows how to:

Download and load a vision model with progress callbacks.
Wrap it with LM-Kit’s VlmOcr engine.
Feed images or PDFs as Attachment objects.
Process multi-page inputs using Attachment.PageCount.
Retrieve recognized text plus generation statistics (tokens, speed, quality, context usage).

Why Vision OCR with LM-Kit.NET?

Local-first: run OCR on your own hardware for privacy-sensitive workloads.
Unified API: same model abstraction (LM) for text and vision pipelines.
Rich telemetry: quality score, token usage, and performance metrics per page.
Drop-in: replace existing OCR engines without changing your data flow too much.

👥 Target Audience

Product and Platform - add OCR to existing .NET backends or pipelines.
Data and Document Processing - bulk ingest of PDFs, scans, screenshots, etc.
RPA and Back-office - extract text from forms, invoices, tickets, and reports.
Demo and Education - minimal, readable example of vision + OCR in C#.

🚀 Problem Solved

Turn images and PDFs into text: extract readable text from photos, screenshots, scans, and PDF pages.
Model flexibility: select a model based on your available VRAM and latency needs.
Operational visibility: built-in stats on speed, context usage, and quality.
Repeatable loop: process one file after another in a single console session.
Multi-page handling: iterate through PDF pages automatically with PageCount.

💻 Sample Application Description

Console app that:

Lets you choose a vision model (or paste a custom model URI).
Downloads the model if needed, with live progress updates.
Wraps it in a VlmOcr instance.
Repeatedly asks you for a file path (image or PDF), then:
- Loads the file as an Attachment.
- Runs OCR page-by-page via ocr.Run(attachment, pageIndex).
- Prints the extracted text to the console.
Displays a stats block (elapsed time, tokens, quality, speed, context usage).
Loops until you type q to quit.

✨ Key Features

🧠 Vision-based OCR: uses a multimodal model behind VlmOcr.
📄 Image + PDF support: the same code path handles both formats.
📥 Interactive loop: enter file path -> get text -> see metrics -> repeat.
📑 Multi-page aware: prints results per page using attachment.PageCount.
📊 Telemetry:
- Elapsed time (seconds)
- Generated tokens count
- Stop reason
- Quality score
- Token generation rate
- Context tokens vs context size
📦 Model lifecycle:
- Automatic download on first use.
- Loading progress shown in the console.
❌ Nice errors: friendly message when a file path is invalid or inaccessible.

On startup, the sample shows a model selection menu:

Option	Model	Approx. VRAM Needed
0	LightOn LightOnOCR 2 1B	~2 GB VRAM
1	Z.ai GLM-OCR 0.9B	~1 GB VRAM
2	Z.ai GLM-V 4.6 Flash 10B	~7 GB VRAM
3	MiniCPM o 4.5 9B	~5.9 GB VRAM
4	Alibaba Qwen 3.5 2B	~2 GB VRAM
5	Alibaba Qwen 3.5 4B	~3.5 GB VRAM
6	Alibaba Qwen 3.5 9B	~7 GB VRAM
7	Google Gemma 3 4B (vision)	~5.7 GB VRAM
8	Google Gemma 3 12B (vision)	~11 GB VRAM
9	Alibaba Qwen 3.5 27B	~18 GB VRAM
10	Mistral Ministral 3 8B	~6.5 GB VRAM
other	Custom model URI (GGUF / LMK, etc.)	depends on model

Any input other than 0-10 is treated as a custom model URI and passed directly to the LM constructor.

🧠 Supported Models

The sample is pre-wired to LM-Kit’s predefined model cards:

lightonocr-2:1b
glm-ocr
glm-4.6v-flash
minicpm-o-45
qwen3.5:2b
qwen3.5:4b
qwen3.5:9b
gemma3:4b
gemma3:12b
qwen3.5:27b
ministral3:8b

Internally:

modelLink = ModelCard
    .GetPredefinedModelCardByModelID("qwen3.5:4b")
    .ModelUri
    .ToString();

You can also provide any valid model URI manually (including local paths or custom model servers) by typing or pasting it when prompted.

🛠️ Commands and Flow

Inside the console loop:

On startup
- Select a model (0-10) or paste a custom model URI.
- The model is downloaded (if needed) and loaded with progress reporting.
Per document (image or PDF)
- The app prompts: enter file path (image or PDF) (or 'q' to quit):
- Type a file path and press Enter.
- The app loads it into an Attachment.
- The app iterates pages:
  - For images, this is typically 1 page.
  - For PDFs, this can be N pages.
- For each page, OCR runs and prints:
  - The recognized text or Markdown
  - A Stats section
Quit
- At any prompt, typing q exits the app cleanly.

🗣️ Example Use Cases

Try the sample with:

A scanned invoice image -> extract all text before sending it to your backend.
A PDF report (multi-page) -> convert page-by-page to Markdown.
A screenshot of a web page -> capture titles and paragraph content.
A photo of a document from a phone -> sanity-check OCR quality and speed.
A code screenshot -> pull code into a text editor for quick edits.
A multi-language flyer -> see how the model handles different languages.

After each run, compare:

Quality score - does the text look correct vs. the page?
Token usage and speed - does a bigger model give better quality at acceptable latency?

⚙️ Behavior and Policies (quick reference)

Model selection: exactly one model per process. To change models, restart the app.
Download and load:
- ModelDownloadingProgress prints Downloading model XX.XX% or byte counts.
- ModelLoadingProgress prints Loading model XX% and clears the console once done.
OCR engine:
- VlmOcr runs OCR with the selected vision model.
- result.PageElement.Text is the recognized text for the page.
Multi-page processing:
- Attachment.PageCount is used to iterate over pages.
- OCR is executed per page using ocr.Run(attachment, pageIndex).
Generation stats:
- result.TextGeneration.GeneratedTokens.Count
- result.TextGeneration.TerminationReason
- result.TextGeneration.QualityScore
- result.TextGeneration.TokenGenerationRate
- result.TextGeneration.ContextTokens.Count / result.TextGeneration.ContextSize
Licensing:
- You can set an optional license key via LicenseManager.SetLicenseKey("").
- A free community license is available from the LM-Kit website.

💻 Minimal Integration Snippet

using System;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

public class VisionOcrSample
{
    public void RunOcr(string modelUri, string filePath)
    {
        // Load the vision model
        var lm = new LM(
            new Uri(modelUri),
            downloadingProgress: (path, contentLength, bytesRead) => true,
            loadingProgress: progress => true);

        // Create OCR engine
        var ocr = new VlmOcr(lm);

        // Wrap the file (image or PDF) as an Attachment
        var attachment = new Attachment(filePath);

        // Run OCR page-by-page (PDFs can be multi-page; images are usually 1 page)
        for (int pageIndex = 0; pageIndex < attachment.PageCount; pageIndex++)
        {
            var result = ocr.Run(attachment, pageIndex);

            // Extracted text / Markdown
            Console.WriteLine(result.PageElement.Text);

            // Optional: generation stats
            Console.WriteLine($"Tokens   : {result.TextGeneration.GeneratedTokens.Count}");
            Console.WriteLine($"Quality  : {result.TextGeneration.QualityScore}");
            Console.WriteLine($"Speed    : {result.TextGeneration.TokenGenerationRate} tok/s");
        }
    }
}

Use this pattern to integrate OCR into web APIs, background workers, or desktop apps.

🛠️ Getting Started

📋 Prerequisites

.NET 8.0 or later

📥 Download

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document_to_markdown

Project Link: document_to_markdown (same path as above)

▶️ Run

dotnet build
dotnet run

Then:

Select a vision model by typing 0-10, or paste a custom model URI.
Wait for the model to download (first run) and load.
When prompted, type the path to an image or PDF file (or q to quit).
Inspect the recognized text and Stats block (per page).
Press Enter to process another file, or q to exit.

🔍 Notes on Key Types

LM (LMKit.Model) - generic model wrapper used by LM-Kit.NET:
- Accepts a Uri pointing to the model.
- Uses callbacks for download and load progress.
VlmOcr (LMKit.Extraction.Ocr) - OCR engine built on top of a vision model:
- Run(Attachment, pageIndex) -> returns an OCR result with PageElement and TextGeneration.
Attachment (LMKit.Data) - wraps external data (here: image files and PDFs):
- new Attachment(string path) loads a file from disk.
- PageCount exposes the number of pages (images are typically 1; PDFs can be many).
- Exceptions are raised when the path is invalid or inaccessible.
TextGeneration - metadata about the underlying generative pass:
- GeneratedTokens, TerminationReason, QualityScore, TokenGenerationRate, ContextTokens, ContextSize.

🔧 Extend the Demo

Write output to disk (--out output.md) instead of only printing to console.
Add page selection for PDFs (--pages 1,3-5).
Add batch mode: process a directory of files.
Post-process PageElement.Text to:
- normalize whitespace,
- detect sections (headers, paragraphs),
- or convert into your own document format.
Combine with LM-Kit’s Structured Extraction to go from document -> markdown -> structured data in one flow.

How-To: Convert Documents to Markdown: Guide to converting PDFs and images to Markdown using vision OCR.
How-To: Extract Text with VLM OCR: Learn how to use VlmOcr for on-device text extraction from documents.
Glossary: Vision Language Models: Explains the multimodal models that power vision-based OCR.
Glossary: Optical Character Recognition: Covers OCR concepts used in document-to-text conversion.
VLM OCR Demo: General-purpose OCR demo with multiple intents including table and formula recognition.

Table of Contents

Document-to-Markdown Vision OCR in .NET Applications

🎯 Purpose of the Demo

👥 Target Audience

🚀 Problem Solved

💻 Sample Application Description

✨ Key Features

🧰 Built-In Models (menu)

🧠 Supported Models

🛠️ Commands and Flow

🗣️ Example Use Cases

⚙️ Behavior and Policies (quick reference)

💻 Minimal Integration Snippet

🛠️ Getting Started

📋 Prerequisites

📥 Download

▶️ Run

🔍 Notes on Key Types

🔧 Extend the Demo

Table of Contents

Document-to-Markdown Vision OCR in .NET Applications

🎯 Purpose of the Demo

👥 Target Audience

🚀 Problem Solved

💻 Sample Application Description

✨ Key Features

🧰 Built-In Models (menu)

🧠 Supported Models

🛠️ Commands and Flow

🗣️ Example Use Cases

⚙️ Behavior and Policies (quick reference)

💻 Minimal Integration Snippet

🛠️ Getting Started

📋 Prerequisites

📥 Download

▶️ Run

🔍 Notes on Key Types

🔧 Extend the Demo

📚 Related Content