👉 Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document_to_markdown

document-to-markdown Vision OCR in .NET Applications

🎯 Purpose of the Demo

document-to-markdown Vision OCR demonstrates how to use LM-Kit.NET with vision-capable models to run on-device OCR on images (documents, screenshots, receipts, etc.) and convert them into clean text (or Markdown-style text) in a loop.

The sample shows how to:

Download and load a vision model with progress callbacks.
Wrap it with LM-Kit’s VlmOcr engine.
Feed images as Attachment objects.
Retrieve recognized text plus generation statistics (tokens, speed, quality, context usage).

Why Vision OCR with LM-Kit.NET?

Local-first: run OCR on your own hardware for privacy-sensitive workloads.
Unified API: same model abstraction (LM) for text and vision pipelines.
Rich telemetry: quality score, token usage, and performance metrics per image.
Drop-in: replace existing OCR engines without changing your data flow too much.

👥 Target Audience

Product & Platform – add OCR to existing .NET backends or pipelines.
Data & Document Processing – bulk ingest of PDFs, scans, screenshots, etc.
RPA / Back-office – extract text from forms, invoices, tickets, and reports.
Demo & Education – minimal, readable example of vision + OCR in C#.

🚀 Problem Solved

Turn images into text: extract readable text from screenshots, scans, or photos.
Model flexibility: select a model based on your available VRAM and latency needs.
Operational visibility: built-in stats on speed, context usage, and quality.
Repeatable loop: process one image after another in a single console session.

💻 Sample Application Description

Console app that:

Lets you choose a vision model (or paste a custom model URI).
Downloads the model if needed, with live progress updates.
Wraps it in a VlmOcr instance.
Repeatedly asks you for an image path, then:
- Loads the file as an Attachment.
- Runs OCR via ocr.Run(attachment).
- Prints the extracted text to the console.
Displays a stats block (elapsed time, tokens, quality, speed, context usage).
Loops until you type q to quit.

✨ Key Features

🧠 Vision-based OCR: uses a multimodal model behind VlmOcr.
📥 Interactive loop: enter image path → get text → see metrics → repeat.
📊 Telemetry:
- Elapsed time (seconds)
- Generated tokens count
- Stop reason
- Quality score
- Token generation rate
- Context tokens vs context size
📦 Model lifecycle:
- Automatic download on first use.
- Loading progress shown in the console.
❌ Nice errors: friendly message when an image path is invalid or inaccessible.

On startup, the sample shows a model selection menu:

Option	Model	Approx. VRAM Needed
0	LightOn LightOnOCR 2 1B	~2 GB VRAM
1	Z.ai GLM-OCR 0.9B	~1 GB VRAM
2	Z.ai GLM-V 4.6 Flash 10B	~7 GB VRAM
3	MiniCPM o 4.5 9B	~5.9 GB VRAM
4	Alibaba Qwen 3.5 2B	~2 GB VRAM
5	Alibaba Qwen 3.5 4B	~3.5 GB VRAM
6	Alibaba Qwen 3.5 9B	~7 GB VRAM
7	Google Gemma 3 4B (vision)	~5.7 GB VRAM
8	Google Gemma 3 12B (vision)	~11 GB VRAM
9	Alibaba Qwen 3.5 27B	~18 GB VRAM
other	Custom model URI (GGUF / LMK, etc.)	depends on model

Any input other than 0–9 is treated as a custom model URI and passed directly to the LM constructor.

🧠 Supported Models

The sample is pre-wired to LM-Kit’s predefined model cards:

lightonocr-2:1b
glm-ocr
glm-4.6v-flash
minicpm-o-45
qwen3.5:2b
qwen3.5:4b
qwen3.5:9b
gemma3:4b
gemma3:12b
qwen3.5:27b

Internally:

modelLink = ModelCard
    .GetPredefinedModelCardByModelID("qwen3.5:4b")
    .ModelUri
    .ToString();

You can also provide any valid model URI manually (including local paths or custom model servers) by typing/pasting it when prompted.

🛠️ Commands & Flow

Inside the console loop:

On startup
- Select a model (0–9) or paste a custom model URI.
- The model is downloaded (if needed) and loaded with progress reporting.
Per image
- The app prompts: enter image path (or 'q' to quit):
- Type a file path and press Enter.
- The app loads it into an Attachment and runs OCR.
- Text is printed, followed by a Stats section.
- Then:
  - Press Enter to process another image, or
  - Type q to exit.
Quit
- At any image prompt or "process another image" prompt, q exits the app cleanly.

🗣️ Example Use Cases

Try the sample with:

A scanned invoice → extract all text before sending it to your backend.
A screenshot of a web page → capture titles and paragraph content.
A photo of a document from a phone → sanity-check OCR quality & speed.
A code screenshot → pull code into a text editor for quick edits.
A multi-language flyer → see how the model handles different languages.

After each run, compare:

Quality score – does the text look correct vs. the image?
Token usage & speed – does a bigger model give better quality at acceptable latency?

⚙️ Behavior & Policies (quick reference)

Model selection: exactly one model per process. To change models, restart the app.
Download & load:
- ModelDownloadingProgress prints Downloading model XX.XX% or byte counts.
- ModelLoadingProgress prints Loading model XX% and clears the console once done.
OCR engine:
- VlmOcr runs OCR with the selected vision model.
- result.PageElement.Text is the recognized text for the page.
Generation stats:
- result.TextGeneration.GeneratedTokens.Count
- result.TextGeneration.TerminationReason
- result.TextGeneration.QualityScore
- result.TextGeneration.TokenGenerationRate
- result.TextGeneration.ContextTokens.Count / result.TextGeneration.ContextSize
Licensing:
- You can set an optional license key via LicenseManager.SetLicenseKey("").
- A free community license is available from the LM-Kit website.

💻 Minimal Integration Snippet

using System;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;

public class VisionOcrSample
{
    public void RunOcr(string modelUri, string imagePath)
    {

        // Load the vision model
        var lm = new LM(
            new Uri(modelUri),
            downloadingProgress: (path, contentLength, bytesRead) => true,
            loadingProgress: progress => true);

        // Create OCR engine
        var ocr = new VlmOcr(lm);

        // Wrap the image as an Attachment
        var attachment = new Attachment(imagePath);

        // Run OCR
        var result = ocr.Run(attachment);

        // Extracted text
        Console.WriteLine(result.PageElement.Text);

        // Optional: generation stats
        Console.WriteLine($"Tokens   : {result.TextGeneration.GeneratedTokens.Count}");
        Console.WriteLine($"Quality  : {result.TextGeneration.QualityScore}");
        Console.WriteLine($"Speed    : {result.TextGeneration.TokenGenerationRate} tok/s");
    }
}

Use this pattern to integrate OCR into web APIs, background workers, or desktop apps.

🛠️ Getting Started

📋 Prerequisites

.NET 8.0 or later

📥 Download

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document_to_markdown

Project Link: document_to_markdown (same path as above)

▶️ Run

dotnet build
dotnet run

Then:

Select a vision model by typing 0–9, or paste a custom model URI.
Wait for the model to download (first run) and load.
When prompted, type the path to an image file (or q to quit).
Inspect the recognized text and Stats block.
Press Enter to process another image, or q to exit.

🔍 Notes on Key Types

LM (LMKit.Model) – generic model wrapper used by LM-Kit.NET:
- Accepts a Uri pointing to the model.
- Uses callbacks for download and load progress.
VlmOcr (LMKit.Extraction.Ocr) – OCR engine built on top of a vision model:
- Run(Attachment) → returns an OCR result with PageElement and TextGeneration.
Attachment (LMKit.Data) – wraps external data (here: image files):
- new Attachment(string path) loads an image from disk.
- Exceptions are raised when the path is invalid or inaccessible.
TextGeneration – metadata about the underlying generative pass:
- GeneratedTokens, TerminationReason, QualityScore, TokenGenerationRate, ContextTokens, ContextSize.

⚠️ Troubleshooting

“Error: Unable to open '…'.”
- The path is wrong, the file doesn’t exist, or permissions are missing.
- Check the path, fix permissions, then try again.
Slow or failing model load
- Insufficient VRAM/CPU or slow storage/network.
- Try a smaller model (e.g., LightOnOCR 2 1B, Qwen 3.5 2B, Ministral 3B).
Out-of-memory or driver errors
- VRAM not sufficient for the selected model.
- Pick a model with lower VRAM requirements or upgrade hardware.
Poor OCR quality
- Try a larger or OCR-focused model (e.g., LightOnOCR 2 1B or higher-capacity vision models).
- Ensure the image is sharp, not heavily compressed, and roughly upright.

🔧 Extend the Demo

Use VlmOcr in a web API to provide OCR as a service.
Pipe the extracted text into:
- RAG pipelines,
- downstream NLP (classification, sentiment, extraction),
- or your own business logic.
Add batch processing (multiple images per run) or directory watchers.
Post-process PageElement.Text to:
- normalize whitespace,
- detect sections (headers, paragraphs),
- or convert into your own document format.
Combine with LM-Kit’s Text Analysis or Structured Extraction to go from image → text → structured data in one flow.

How-To: Convert Documents to Markdown: Guide to converting images and PDFs to Markdown using vision OCR.
How-To: Extract Text with VLM OCR: Learn how to use VlmOcr for on-device text extraction from images.
Glossary: Vision Language Models: Explains the multimodal models that power vision-based OCR.
Document to Markdown Demo: Similar demo that also handles multi-page PDF documents.

Table of Contents

document-to-markdown Vision OCR in .NET Applications

🎯 Purpose of the Demo

👥 Target Audience

🚀 Problem Solved

💻 Sample Application Description

✨ Key Features

🧰 Built-In Models (menu)

🧠 Supported Models

🛠️ Commands & Flow

🗣️ Example Use Cases

⚙️ Behavior & Policies (quick reference)

💻 Minimal Integration Snippet

🛠️ Getting Started

📋 Prerequisites

📥 Download

▶️ Run

🔍 Notes on Key Types

⚠️ Troubleshooting

🔧 Extend the Demo

Table of Contents

document-to-markdown Vision OCR in .NET Applications

🎯 Purpose of the Demo

👥 Target Audience

🚀 Problem Solved

💻 Sample Application Description

✨ Key Features

🧰 Built-In Models (menu)

🧠 Supported Models

🛠️ Commands & Flow

🗣️ Example Use Cases

⚙️ Behavior & Policies (quick reference)

💻 Minimal Integration Snippet

🛠️ Getting Started

📋 Prerequisites

📥 Download

▶️ Run

🔍 Notes on Key Types

⚠️ Troubleshooting

🔧 Extend the Demo

📚 Related Content