Enum VlmOcrIntent
- Namespace
- LMKit.Extraction.Ocr
- Assembly
- LM-Kit.NET.dll
Specifies the desired outcome of a VlmOcr operation.
public enum VlmOcrIntent
Fields
Undefined = 0No explicit intent specified. The engine selects a default intent based on the loaded model: plain-text OCR for models that support it natively (for example, PaddleOCR-VL), Markdown conversion for general-purpose models.
PlainText = 1Extract text from the image as plain, unformatted text without any markup, coordinates, or structural annotations.
TableRecognition = 2Detect and extract tabular structures, preserving rows and columns.
FormulaRecognition = 3Recognize and transcribe mathematical formulas.
ChartRecognition = 4Interpret charts, graphs, and data visualizations.
OcrWithCoordinates = 5Extract text together with bounding-box coordinates for each detected region. The coordinate format is model-dependent.
SealRecognition = 6Recognize and transcribe stamps, seals, and similar graphical marks.
Markdown = 7Transcribe the page content as Markdown, preserving headings, lists, emphasis, and other structural elements.
Examples
// Create a VlmOcr with an explicit intent
var ocr = new VlmOcr(model, VlmOcrIntent.TableRecognition);
// Query supported intents for a specific model
IReadOnlyList<VlmOcrIntent> intents = VlmOcr.GetSupportedIntents(model);
foreach (var intent in intents)
{
Console.WriteLine(intent);
}
Remarks
Each member represents a high-level intent describing what the caller expects from the OCR engine. Not every vision-language model natively supports every intent. The engine maps each intent to the best available instruction and post-processing strategy for the loaded model, applying all possible internal logic to reach the desired result.