Table of Contents

Class VlmOcr

Namespace
LMKit.Extraction.Ocr
Assembly
LM-Kit.NET.dll

Provides an OCR engine implementation backed by a multimodal LM model.

public sealed class VlmOcr : OcrEngine
Inheritance
VlmOcr
Inherited Members

Remarks

VlmOcr uses the vision and text-generation capabilities of the supplied LM instance to transcribe visual content into text. The input can be provided either as an Attachment (for example, a page image in a document-processing pipeline) or directly as an ImageBuffer.

The output format and level of detail can be influenced by the Instruction property, which is sent to the model together with the image. When left empty, the engine relies on model- and configuration-specific defaults for how text should be returned.

Constructors

VlmOcr(LM)

Initializes a new instance of the VlmOcr class using the specified model.

Properties

Instruction

Gets or sets the natural-language instruction used to guide the transcription.

Methods

Run(Attachment, CancellationToken)

Runs OCR synchronously on the specified attachment and returns a detailed result.

Run(ImageBuffer, CancellationToken)

Runs OCR synchronously on the specified image and returns a detailed result.

RunAsync(Attachment, CancellationToken)

Runs OCR on the specified attachment and returns a detailed result.

RunAsync(OcrParameters, CancellationToken)

Runs OCR on the specified page using the configured multimodal language model.

RunAsync(ImageBuffer, CancellationToken)

Runs OCR on the specified image and returns a detailed result.