Class VlmOcr
- Namespace
- LMKit.Extraction.Ocr
- Assembly
- LM-Kit.NET.dll
Provides an OCR engine implementation backed by a multimodal LM model.
public sealed class VlmOcr : OcrEngine
- Inheritance
-
VlmOcr
- Inherited Members
Remarks
VlmOcr uses the vision and text-generation capabilities of the supplied LM instance to transcribe visual content into text. The input can be provided either as an Attachment (for example, a page image in a document-processing pipeline) or directly as an ImageBuffer.
The output format and level of detail can be influenced by the Instruction property, which is sent to the model together with the image. When left empty, the engine relies on model- and configuration-specific defaults for how text should be returned.
Constructors
- VlmOcr(LM)
Initializes a new instance of the VlmOcr class using the specified model.
Properties
- Instruction
Gets or sets the natural-language instruction used to guide the transcription.
Methods
- Run(Attachment, CancellationToken)
Runs OCR synchronously on the specified attachment and returns a detailed result.
- Run(ImageBuffer, CancellationToken)
Runs OCR synchronously on the specified image and returns a detailed result.
- RunAsync(Attachment, CancellationToken)
Runs OCR on the specified attachment and returns a detailed result.
- RunAsync(OcrParameters, CancellationToken)
Runs OCR on the specified page using the configured multimodal language model.
- RunAsync(ImageBuffer, CancellationToken)
Runs OCR on the specified image and returns a detailed result.