Table of Contents

Convert Documents to Markdown with VLM OCR

Scanned PDFs, photographed whiteboards, and image-based documents contain valuable text locked inside pixels. LM-Kit.NET's VlmOcr class uses Vision Language Models to convert these into structured Markdown, preserving headings, tables, lists, and code blocks. Unlike traditional OCR that outputs flat text, VLM OCR understands document layout and produces properly formatted output. This tutorial builds a document-to-Markdown converter for PDFs, images, and multi-page documents.


Why VLM OCR Over Traditional OCR

Two practical advantages of vision-model OCR:

  1. Structure preservation. Traditional OCR produces a flat string of characters. VLM OCR understands that a bold line at the top is a heading, that aligned columns are a table, and that indented text is a list. The output is ready-to-use Markdown, not raw text that needs post-processing.
  2. Handwriting and poor scans. Traditional OCR struggles with handwritten notes, low-resolution scans, and photographs of documents. Vision models handle degraded inputs because they understand context, not just character shapes.

Prerequisites

Requirement Minimum
.NET SDK 8.0+
VRAM 1.5+ GB
Disk ~1 GB free for model download

Step 1: Create the Project

dotnet new console -n OcrQuickstart
cd OcrQuickstart
dotnet add package LM-Kit.NET

Step 2: Convert an Image to Markdown

using System.Text;
using LMKit.Model;
using LMKit.Extraction.Ocr;
using LMKit.Graphics;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load a model trained for OCR
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("lightonocr-2:1b",
    downloadingProgress: (_, len, read) =>
    {
        if (len.HasValue) Console.Write($"\r  Downloading: {(double)read / len.Value * 100:F1}%   ");
        return true;
    },
    loadingProgress: p => { Console.Write($"\r  Loading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Convert an image to Markdown
// ──────────────────────────────────────
var ocr = new VlmOcr(model);

var image = new ImageBuffer("scanned_document.png");
VlmOcr.VlmOcrResult result = ocr.Run(image);

string markdown = result.TextGeneration.Completion;
Console.WriteLine(markdown);

// Save to file
File.WriteAllText("output.md", markdown);
Console.WriteLine("\nSaved to output.md");

Step 3: Convert a PDF (Multi-Page)

Process each page of a PDF and combine the results:

using LMKit.Data;

var ocr = new VlmOcr(model)
{
    MaximumCompletionTokens = 4096
};

string pdfPath = "report.pdf";
var attachment = new Attachment(pdfPath);

// Get page count
int pageCount = attachment.PageCount;
Console.WriteLine($"Processing {pageCount} pages from {Path.GetFileName(pdfPath)}...\n");

var fullDocument = new StringBuilder();

for (int page = 0; page < pageCount; page++)
{
    Console.Write($"  Page {page + 1}/{pageCount}... ");

    VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: page);
    string pageMarkdown = pageResult.TextGeneration.Completion;

    fullDocument.AppendLine($"<!-- Page {page + 1} -->");
    fullDocument.AppendLine(pageMarkdown);
    fullDocument.AppendLine();

    int tokens = pageResult.TextGeneration.GeneratedTokenCount;
    Console.WriteLine($"{tokens} tokens generated");
}

string outputPath = Path.ChangeExtension(pdfPath, ".md");
File.WriteAllText(outputPath, fullDocument.ToString());
Console.WriteLine($"\nSaved {pageCount} pages to {outputPath}");

Step 4: Custom Instructions

The Instruction property guides the model on how to transcribe the content. This is useful for specialized documents:

// Default: general document transcription
var ocr = new VlmOcr(model);

// Focus on tables and structured data
ocr.Instruction = "Extract all tables as Markdown tables. Preserve column alignment and headers.";

// Focus on code
ocr.Instruction = "This is a screenshot of source code. Transcribe as a fenced code block with language annotation.";

// Focus on forms
ocr.Instruction = "This is a scanned form. Extract each field as a key-value pair in Markdown.";

var image = new ImageBuffer("form_scan.png");
VlmOcr.VlmOcrResult result = ocr.Run(image);
Console.WriteLine(result.TextGeneration.Completion);

Step 5: Batch Conversion

Convert an entire folder of images or PDFs:

string inputDir = "scanned_docs";
string outputDir = "markdown_output";
Directory.CreateDirectory(outputDir);

string[] supportedExtensions = { ".png", ".jpg", ".jpeg", ".tiff", ".bmp", ".webp", ".pdf" };

string[] files = Directory.GetFiles(inputDir)
    .Where(f => supportedExtensions.Contains(Path.GetExtension(f).ToLowerInvariant()))
    .ToArray();

Console.WriteLine($"Converting {files.Length} files...\n");

var ocr = new VlmOcr(model)
{
    MaximumCompletionTokens = 4096
};

foreach (string file in files)
{
    string fileName = Path.GetFileName(file);
    Console.Write($"  {fileName}... ");

    var attachment = new Attachment(file);

    if (attachment.PageCount > 1)
    {
        // Multi-page document
        var pages = new StringBuilder();
        for (int p = 0; p < attachment.PageCount; p++)
        {
            VlmOcr.VlmOcrResult pageResult = ocr.Run(attachment, pageIndex: p);
            pages.AppendLine(pageResult.TextGeneration.Completion);
            pages.AppendLine();
        }

        string outPath = Path.Combine(outputDir, Path.ChangeExtension(fileName, ".md"));
        File.WriteAllText(outPath, pages.ToString());
        Console.WriteLine($"{attachment.PageCount} pages");
    }
    else
    {
        // Single page/image
        VlmOcr.VlmOcrResult result = ocr.Run(attachment);
        string outPath = Path.Combine(outputDir, Path.ChangeExtension(fileName, ".md"));
        File.WriteAllText(outPath, result.TextGeneration.Completion);
        Console.WriteLine("done");
    }
}

Console.WriteLine($"\nAll files saved to {outputDir}/");

Step 6: Performance Metrics

Track token generation speed and processing time:

var stopwatch = System.Diagnostics.Stopwatch.StartNew();

VlmOcr.VlmOcrResult result = ocr.Run(new ImageBuffer("document.png"));

stopwatch.Stop();

int tokens = result.TextGeneration.GeneratedTokenCount;
double seconds = stopwatch.Elapsed.TotalSeconds;
double tokensPerSecond = tokens / seconds;

Console.WriteLine($"Tokens generated: {tokens}");
Console.WriteLine($"Time elapsed:     {seconds:F1}s");
Console.WriteLine($"Speed:            {tokensPerSecond:F1} tokens/s");
Console.WriteLine($"Output length:    {result.TextGeneration.Completion.Length} characters");

Model Selection for VLM OCR

Model ID VRAM Speed Quality Best For
lightonocr-2:1b ~2 GB Fastest Very good Purpose-built OCR model (recommended)
qwen3-vl:2b ~2.5 GB Very fast Good Lightweight multilingual OCR
ministral3:3b ~3.5 GB Fast Good Compact general-purpose VLM
qwen3-vl:4b ~4 GB Fast Very good Multilingual documents
gemma3:4b ~5.7 GB Moderate Good Mixed text and vision tasks
minicpm-o-45 ~5.9 GB Moderate Very good Strong all-round vision model
qwen3-vl:8b ~6.5 GB Moderate Excellent High-quality multilingual OCR
ministral3:8b ~6.5 GB Moderate Very good Complex document layouts
gemma3:12b ~11 GB Slow Excellent Complex layouts, tables, handwriting
ministral3:14b ~12 GB Slow Excellent Highest quality for critical documents

LightOnOCR 2 is a compact 1B model specifically trained for high-accuracy OCR and document understanding. It delivers fast, layout-aware text extraction and is the best choice for dedicated OCR workloads. For multilingual documents, the Qwen3-VL family offers strong results. Use a larger model like gemma3:12b or ministral3:14b when dealing with complex layouts, degraded scans, or handwriting.


Common Issues

Problem Cause Fix
Output truncated mid-sentence MaximumCompletionTokens too low Increase to 4096 or higher; set to -1 for unlimited
Image markup in output (![](...)) StripImageMarkup is false Set ocr.StripImageMarkup = true (default)
Tables not properly formatted Model struggles with complex table layouts Use a larger model; add Instruction specifying table extraction
Slow on large PDFs Processing all pages sequentially Process pages in parallel with async; or focus on specific pages only
Blank output Image too small or low contrast Resize image before processing; improve scan quality

Next Steps