Enum VlmOcrIntent

Namespace: LMKit.Extraction.Ocr

Assembly: LM-Kit.NET.dll

Specifies the desired outcome of a VlmOcr operation.

public enum VlmOcrIntent

Fields

Undefined = 0: No explicit intent specified. The engine selects a default intent based on the loaded model: plain-text OCR for models that support it natively (for example, PaddleOCR-VL), Markdown conversion for general-purpose models.
PlainText = 1: Extract text from the image as plain, unformatted text without any markup, coordinates, or structural annotations.
TableRecognition = 2: Detect and extract tabular structures, preserving rows and columns.
FormulaRecognition = 3: Recognize and transcribe mathematical formulas.
ChartRecognition = 4: Interpret charts, graphs, and data visualizations.
OcrWithCoordinates = 5: Extract text together with bounding-box coordinates for each detected region. The coordinate format is model-dependent.
SealRecognition = 6: Recognize and transcribe stamps, seals, and similar graphical marks.
Markdown = 7: Transcribe the page content as Markdown, preserving headings, lists, emphasis, and other structural elements.

Examples

// Create a VlmOcr with an explicit intent
var ocr = new VlmOcr(model, VlmOcrIntent.TableRecognition);

// Query supported intents for a specific model
IReadOnlyList<VlmOcrIntent> intents = VlmOcr.GetSupportedIntents(model);
foreach (var intent in intents)
{
    Console.WriteLine(intent);
}

Remarks

Each member represents a high-level intent describing what the caller expects from the OCR engine. Not every vision-language model natively supports every intent. The engine maps each intent to the best available instruction and post-processing strategy for the loaded model, applying all possible internal logic to reach the desired result.

Table of Contents

Enum VlmOcrIntent

Fields

Examples

Remarks