Table of Contents

Enum DocumentToMarkdownStrategy

Namespace
LMKit.Document.Conversion
Assembly
LM-Kit.NET.dll

Specifies the strategy used by DocumentToMarkdown when transforming a document into Markdown.

public enum DocumentToMarkdownStrategy

Fields

TextExtraction = 0

Use the embedded text layer of the document. Fast and deterministic, and requires no language model. Supply OcrEngine to additionally run a traditional OCR engine over image attachments, over each PDF page's embedded raster images (charts, figure legends), and as a full-page fallback when a PDF page has no native text layer (scans, flattened print-to-PDF).

VlmOcr = 1

Render each page as an image and transcribe it with a vision-language model. Best suited for scanned, handwritten, or layout-heavy documents. Requires a vision-capable LM, either user-supplied or the lazily loaded default lightonocr-2:1b.

Hybrid = 2

Apply per-page selection between TextExtraction and VlmOcr. Pages that expose a clean text layer and contain no embedded images are handled by text extraction; pages without an extractable text layer or containing embedded images are routed to the vision-language model. Image attachments always resolve to VlmOcr. Recommended default for unknown corpora and mixed-content PDFs.

Examples

using LMKit.Document.Conversion;
using LMKit.Model;

var model = LM.LoadFromModelID("lightonocr-2:1b");
var converter = new DocumentToMarkdown(model);

// Force VLM OCR on every page (e.g. for scanned PDFs).
var result = converter.Convert("invoice.pdf", new DocumentToMarkdownOptions
{
    Strategy = DocumentToMarkdownStrategy.VlmOcr
});

Remarks

The strategy controls how textual content is recovered from every page:

  • TextExtraction reads the embedded text layer and formats it as Markdown. It is the fastest strategy and requires no language model. On its own it produces no content for scanned pages or image-only inputs; pair it with OcrEngine to turn on traditional OCR for image attachments, embedded raster enrichment on PDF pages (charts, labels, figure legends), and a full-page OCR fallback on PDF pages with no text layer. See OcrEngine for the full rundown.
  • VlmOcr renders every page as an image and asks a vision-language model to transcribe it. It recovers content from scanned, handwritten, and image-heavy documents but is the slowest strategy and requires a vision-capable LM (either user-supplied or the lazily loaded default lightonocr-2:1b).
  • Hybrid inspects each page individually. A page is routed to VlmOcr when it has no extractable text layer or when it contains embedded images; all other pages (pure-text pages) are handled by TextExtraction. Image attachments always resolve to VlmOcr. This is the recommended strategy for mixed-content documents such as PDFs combining born-digital text, scanned pages, and diagrams.
Share