Class VlmOcr

Namespace: LMKit.Extraction.Ocr

Assembly: LM-Kit.NET.dll

Provides an OCR engine implementation backed by a multimodal LM model.

public sealed class VlmOcr : OcrEngine

Inheritance: object

OcrEngine

VlmOcr

Inherited Members: OcrEngine.OcrStarting

OcrEngine.OcrCompleted

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Examples

Example: Transcribe an image as Markdown using a vision-language model

var model = LM.LoadFromModelID("gemma3:12b");
var ocr = new VlmOcr(model, VlmOcrIntent.Markdown);
using var image = ImageBuffer.LoadAsRGB("invoice.png");
VlmOcr.VlmOcrResult result = ocr.Run(image);
Console.WriteLine(result.PageElement.Text);

Example: Use VlmOcr as an OCR engine for TextExtraction

var vlmModel = LM.LoadFromModelID("gemma3:12b");
var ocr = new VlmOcr(vlmModel);
var extractor = new TextExtraction(extractionModel);
extractor.OcrEngine = ocr;

Remarks

VlmOcr uses the vision and text-generation capabilities of the supplied LM instance to transcribe visual content into text. The input can be provided either as an Attachment (for example, a page image in a document-processing pipeline) or directly as an ImageBuffer.

The output format and level of detail can be influenced by the Instruction property, which is sent to the model together with the image. When left empty, the engine relies on model- and configuration-specific defaults for how text should be returned.

Constructors

VlmOcr(LM): Initializes a new instance of the VlmOcr class using the specified model with a default intent selected automatically based on the model family.

VlmOcr(LM, VlmOcrIntent): Initializes a new instance of the VlmOcr class using the specified model and an explicit OCR intent.

Properties

Instruction: Gets or sets the natural-language instruction used to guide the transcription.

Intent: Gets the resolved intent that governs how the OCR engine instructs the model and post-processes its output.

MaximumCompletionTokens: Gets or sets the maximum number of tokens permitted for the OCR transcription output.

Model: Gets the language model instance used by this object.

StripImageMarkup: Gets or sets a value indicating whether Markdown image references should be removed from the transcription output.

StripStyleAttributes: Gets or sets a value indicating whether inline style attributes should be removed from HTML elements in the transcription output.

Methods

GetSupportedIntents(LM): Returns the intents that are known to produce dedicated results with the specified model.

Run(Attachment, int, CancellationToken): Runs OCR synchronously on the specified attachment and returns a detailed result.

Run(ImageBuffer, CancellationToken): Runs OCR synchronously on the specified image and returns a detailed result.

RunAsync(Attachment, int, CancellationToken): Runs OCR on the specified attachment and returns a detailed result.

RunAsync(OcrParameters, CancellationToken): Runs OCR on the specified page using the configured multimodal language model.

RunAsync(ImageBuffer, CancellationToken): Runs OCR on the specified image and returns a detailed result.