π Try the demo:
https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document_to_markdown
Document-to-Markdown Vision OCR in .NET Applications
π― Purpose of the Sample
Document-to-Markdown Vision OCR demonstrates how to use LM-Kit.NET with vision-capable models to run on-device OCR on images and PDF documents (scans, screenshots, receipts, reports, etc.) and convert them into clean text (or Markdown-style text) in a loop.
The sample shows how to:
- Download and load a vision model with progress callbacks.
- Wrap it with LM-Kitβs
VlmOcrengine. - Feed images or PDFs as
Attachmentobjects. - Process multi-page inputs using
Attachment.PageCount. - Retrieve recognized text plus generation statistics (tokens, speed, quality, context usage).
Why Vision OCR with LM-Kit.NET?
- Local-first: run OCR on your own hardware for privacy-sensitive workloads.
- Unified API: same model abstraction (
LM) for text and vision pipelines. - Rich telemetry: quality score, token usage, and performance metrics per page.
- Drop-in: replace existing OCR engines without changing your data flow too much.
π₯ Target Audience
- Product and Platform - add OCR to existing .NET backends or pipelines.
- Data and Document Processing - bulk ingest of PDFs, scans, screenshots, etc.
- RPA and Back-office - extract text from forms, invoices, tickets, and reports.
- Demo and Education - minimal, readable example of vision + OCR in C#.
π Problem Solved
- Turn images and PDFs into text: extract readable text from photos, screenshots, scans, and PDF pages.
- Model flexibility: select a model based on your available VRAM and latency needs.
- Operational visibility: built-in stats on speed, context usage, and quality.
- Repeatable loop: process one file after another in a single console session.
- Multi-page handling: iterate through PDF pages automatically with
PageCount.
π» Sample Application Description
Console app that:
Lets you choose a vision model (or paste a custom model URI).
Downloads the model if needed, with live progress updates.
Wraps it in a
VlmOcrinstance.Repeatedly asks you for a file path (image or PDF), then:
- Loads the file as an
Attachment. - Runs OCR page-by-page via
ocr.Run(attachment, pageIndex). - Prints the extracted text to the console.
- Loads the file as an
Displays a stats block (elapsed time, tokens, quality, speed, context usage).
Loops until you type
qto quit.
β¨ Key Features
π§ Vision-based OCR: uses a multimodal model behind
VlmOcr.π Image + PDF support: the same code path handles both formats.
π₯ Interactive loop: enter file path -> get text -> see metrics -> repeat.
π Multi-page aware: prints results per page using
attachment.PageCount.π Telemetry:
- Elapsed time (seconds)
- Generated tokens count
- Stop reason
- Quality score
- Token generation rate
- Context tokens vs context size
π¦ Model lifecycle:
- Automatic download on first use.
- Loading progress shown in the console.
β Nice errors: friendly message when a file path is invalid or inaccessible.
π§° Built-In Models (menu)
On startup, the sample shows a model selection menu:
| Option | Model | Approx. VRAM Needed |
|---|---|---|
| 0 | LightOn LightOnOCR 1025 1B | ~2 GB VRAM |
| 1 | MiniCPM 2.6 o 8.1B | ~5.9 GB VRAM |
| 2 | Alibaba Qwen 3 2B (vision) | ~2.5 GB VRAM |
| 3 | Alibaba Qwen 3 4B (vision) | ~4 GB VRAM |
| 4 | Alibaba Qwen 3 8B (vision) | ~6.5 GB VRAM |
| 5 | Google Gemma 3 4B (vision) | ~5.7 GB VRAM |
| 6 | Google Gemma 3 12B (vision) | ~11 GB VRAM |
| 7 | Mistral Ministral 3 3B (vision) | ~3.5 GB VRAM |
| 8 | Mistral Ministral 3 8B (vision) | ~6.5 GB VRAM |
| 9 | Mistral Ministral 3 14B (vision) | ~12 GB VRAM |
| other | Custom model URI (GGUF / LMK, etc.) | depends on model |
Any input other than
0-9is treated as a custom model URI and passed directly to theLMconstructor.
π§ Supported Models
The sample is pre-wired to LM-Kitβs predefined model cards:
lightonocr1025:1bminicpm-oqwen3-vl:2bqwen3-vl:4bqwen3-vl:8bgemma3:4bgemma3:12bministral3:3bministral3:8bministral3:14b
Internally:
modelLink = ModelCard
.GetPredefinedModelCardByModelID("qwen3-vl:4b")
.ModelUri
.ToString();
You can also provide any valid model URI manually (including local paths or custom model servers) by typing or pasting it when prompted.
π οΈ Commands and Flow
Inside the console loop:
On startup
- Select a model (0-9) or paste a custom model URI.
- The model is downloaded (if needed) and loaded with progress reporting.
Per document (image or PDF)
The app prompts:
enter file path (image or PDF) (or 'q' to quit):Type a file path and press Enter.
The app loads it into an
Attachment.The app iterates pages:
- For images, this is typically 1 page.
- For PDFs, this can be N pages.
For each page, OCR runs and prints:
- The recognized text or Markdown
- A Stats section
Quit
- At any prompt, typing
qexits the app cleanly.
- At any prompt, typing
π£οΈ Example Use Cases
Try the sample with:
- A scanned invoice image -> extract all text before sending it to your backend.
- A PDF report (multi-page) -> convert page-by-page to Markdown.
- A screenshot of a web page -> capture titles and paragraph content.
- A photo of a document from a phone -> sanity-check OCR quality and speed.
- A code screenshot -> pull code into a text editor for quick edits.
- A multi-language flyer -> see how the model handles different languages.
After each run, compare:
- Quality score - does the text look correct vs. the page?
- Token usage and speed - does a bigger model give better quality at acceptable latency?
βοΈ Behavior and Policies (quick reference)
Model selection: exactly one model per process. To change models, restart the app.
Download and load:
ModelDownloadingProgressprintsDownloading model XX.XX%or byte counts.ModelLoadingProgressprintsLoading model XX%and clears the console once done.
OCR engine:
VlmOcrruns OCR with the selected vision model.result.PageElement.Textis the recognized text for the page.
Multi-page processing:
Attachment.PageCountis used to iterate over pages.- OCR is executed per page using
ocr.Run(attachment, pageIndex).
Generation stats:
result.TextGeneration.GeneratedTokens.Countresult.TextGeneration.TerminationReasonresult.TextGeneration.QualityScoreresult.TextGeneration.TokenGenerationRateresult.TextGeneration.ContextTokens.Count/result.TextGeneration.ContextSize
Licensing:
- You can set an optional license key via
LicenseManager.SetLicenseKey(""). - A free community license is available from the LM-Kit website.
- You can set an optional license key via
π» Minimal Integration Snippet
using System;
using LMKit.Data;
using LMKit.Extraction.Ocr;
using LMKit.Model;
public class VisionOcrSample
{
public void RunOcr(string modelUri, string filePath)
{
// Load the vision model
var lm = new LM(
new Uri(modelUri),
downloadingProgress: (path, contentLength, bytesRead) => true,
loadingProgress: progress => true);
// Create OCR engine
var ocr = new VlmOcr(lm);
// Wrap the file (image or PDF) as an Attachment
var attachment = new Attachment(filePath);
// Run OCR page-by-page (PDFs can be multi-page; images are usually 1 page)
for (int pageIndex = 0; pageIndex < attachment.PageCount; pageIndex++)
{
var result = ocr.Run(attachment, pageIndex);
// Extracted text / Markdown
Console.WriteLine(result.PageElement.Text);
// Optional: generation stats
Console.WriteLine($"Tokens : {result.TextGeneration.GeneratedTokens.Count}");
Console.WriteLine($"Quality : {result.TextGeneration.QualityScore}");
Console.WriteLine($"Speed : {result.TextGeneration.TokenGenerationRate} tok/s");
}
}
}
Use this pattern to integrate OCR into web APIs, background workers, or desktop apps.
π οΈ Getting Started
π Prerequisites
- .NET Framework 4.6.2 or .NET 8.0+
π₯ Download
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document_to_markdown
Project Link: document_to_markdown (same path as above)
βΆοΈ Run
dotnet build
dotnet run
Then:
- Select a vision model by typing 0-9, or paste a custom model URI.
- Wait for the model to download (first run) and load.
- When prompted, type the path to an image or PDF file (or
qto quit). - Inspect the recognized text and Stats block (per page).
- Press Enter to process another file, or
qto exit.
π Notes on Key Types
LM(LMKit.Model) - generic model wrapper used by LM-Kit.NET:- Accepts a
Uripointing to the model. - Uses callbacks for download and load progress.
- Accepts a
VlmOcr(LMKit.Extraction.Ocr) - OCR engine built on top of a vision model:Run(Attachment, pageIndex)-> returns an OCR result withPageElementandTextGeneration.
Attachment(LMKit.Data) - wraps external data (here: image files and PDFs):new Attachment(string path)loads a file from disk.PageCountexposes the number of pages (images are typically 1; PDFs can be many).- Exceptions are raised when the path is invalid or inaccessible.
TextGeneration- metadata about the underlying generative pass:GeneratedTokens,TerminationReason,QualityScore,TokenGenerationRate,ContextTokens,ContextSize.
π§ Extend the Demo
Write output to disk (
--out output.md) instead of only printing to console.Add page selection for PDFs (
--pages 1,3-5).Add batch mode: process a directory of files.
Post-process
PageElement.Textto:- normalize whitespace,
- detect sections (headers, paragraphs),
- or convert into your own document format.
Combine with LM-Kitβs Structured Extraction to go from document -> markdown -> structured data in one flow.