Enum DocumentToMarkdownStrategy
- Namespace
- LMKit.Document.Conversion
- Assembly
- LM-Kit.NET.dll
Specifies the strategy used by DocumentToMarkdown when transforming a document into Markdown.
public enum DocumentToMarkdownStrategy
Fields
TextExtraction = 0Use the embedded text layer of the document. Fast and deterministic, and requires no language model. Supply OcrEngine to additionally run a traditional OCR engine over image attachments, over each PDF page's embedded raster images (charts, figure legends), and as a full-page fallback when a PDF page has no native text layer (scans, flattened print-to-PDF).
VlmOcr = 1Render each page as an image and transcribe it with a vision-language model. Best suited for scanned, handwritten, or layout-heavy documents. Requires a vision-capable LM, either user-supplied or the lazily loaded default
lightonocr-2:1b.Hybrid = 2Apply per-page selection between TextExtraction and VlmOcr. Pages that expose a clean text layer and contain no embedded images are handled by text extraction; pages without an extractable text layer or containing embedded images are routed to the vision-language model. Image attachments always resolve to VlmOcr. Recommended default for unknown corpora and mixed-content PDFs.
Examples
using LMKit.Document.Conversion;
using LMKit.Model;
var model = LM.LoadFromModelID("lightonocr-2:1b");
var converter = new DocumentToMarkdown(model);
// Force VLM OCR on every page (e.g. for scanned PDFs).
var result = converter.Convert("invoice.pdf", new DocumentToMarkdownOptions
{
Strategy = DocumentToMarkdownStrategy.VlmOcr
});
Remarks
The strategy controls how textual content is recovered from every page:
- TextExtraction reads the embedded text layer and formats it as Markdown. It is the fastest strategy and requires no language model. On its own it produces no content for scanned pages or image-only inputs; pair it with OcrEngine to turn on traditional OCR for image attachments, embedded raster enrichment on PDF pages (charts, labels, figure legends), and a full-page OCR fallback on PDF pages with no text layer. See OcrEngine for the full rundown.
-
VlmOcr renders every page as an image and asks a vision-language
model to transcribe it. It recovers content from scanned, handwritten, and
image-heavy documents but is the slowest strategy and requires a vision-capable
LM (either user-supplied or the lazily loaded default
lightonocr-2:1b). - Hybrid inspects each page individually. A page is routed to VlmOcr when it has no extractable text layer or when it contains embedded images; all other pages (pure-text pages) are handled by TextExtraction. Image attachments always resolve to VlmOcr. This is the recommended strategy for mixed-content documents such as PDFs combining born-digital text, scanned pages, and diagrams.