Table of Contents

Enum VlmOcrIntent

Namespace
LMKit.Extraction.Ocr
Assembly
LM-Kit.NET.dll

Specifies the desired outcome of a VlmOcr operation.

public enum VlmOcrIntent

Fields

Undefined = 0

No explicit intent specified. The engine selects a default intent based on the loaded model: plain-text OCR for models that support it natively (for example, PaddleOCR-VL), Markdown conversion for general-purpose models.

PlainText = 1

Extract text from the image as plain, unformatted text without any markup, coordinates, or structural annotations.

TableRecognition = 2

Detect and extract tabular structures, preserving rows and columns.

FormulaRecognition = 3

Recognize and transcribe mathematical formulas.

ChartRecognition = 4

Interpret charts, graphs, and data visualizations.

OcrWithCoordinates = 5

Extract text together with bounding-box coordinates for each detected region. The coordinate format is model-dependent.

SealRecognition = 6

Recognize and transcribe stamps, seals, and similar graphical marks.

Markdown = 7

Transcribe the page content as Markdown, preserving headings, lists, emphasis, and other structural elements.

Examples

// Create a VlmOcr with an explicit intent
var ocr = new VlmOcr(model, VlmOcrIntent.TableRecognition);

// Query supported intents for a specific model
IReadOnlyList<VlmOcrIntent> intents = VlmOcr.GetSupportedIntents(model);
foreach (var intent in intents)
{
    Console.WriteLine(intent);
}

Remarks

Each member represents a high-level intent describing what the caller expects from the OCR engine. Not every vision-language model natively supports every intent. The engine maps each intent to the best available instruction and post-processing strategy for the loaded model, applying all possible internal logic to reach the desired result.

Share