Class VlmOcr
- Namespace
- LMKit.Extraction.Ocr
- Assembly
- LM-Kit.NET.dll
Provides an OCR engine implementation backed by a multimodal LM model.
public sealed class VlmOcr : OcrEngine
- Inheritance
-
VlmOcr
- Inherited Members
Examples
Example: Transcribe an image as Markdown using a vision-language model
var model = LM.LoadFromModelID("gemma3:12b");
var ocr = new VlmOcr(model, VlmOcrIntent.Markdown);
using var image = ImageBuffer.LoadAsRGB("invoice.png");
VlmOcr.VlmOcrResult result = ocr.Run(image);
Console.WriteLine(result.PageElement.Text);
Example: Use VlmOcr as an OCR engine for TextExtraction
var vlmModel = LM.LoadFromModelID("gemma3:12b");
var ocr = new VlmOcr(vlmModel);
var extractor = new TextExtraction(extractionModel);
extractor.OcrEngine = ocr;
Remarks
VlmOcr uses the vision and text-generation capabilities of the supplied LM instance to transcribe visual content into text. The input can be provided either as an Attachment (for example, a page image in a document-processing pipeline) or directly as an ImageBuffer.
The output format and level of detail can be influenced by the Instruction property, which is sent to the model together with the image. When left empty, the engine relies on model- and configuration-specific defaults for how text should be returned.
Constructors
- VlmOcr(LM)
Initializes a new instance of the VlmOcr class using the specified model with a default intent selected automatically based on the model family.
- VlmOcr(LM, VlmOcrIntent)
Initializes a new instance of the VlmOcr class using the specified model and an explicit OCR intent.
Properties
- Instruction
Gets or sets the natural-language instruction used to guide the transcription.
- Intent
Gets the resolved intent that governs how the OCR engine instructs the model and post-processes its output.
- MaximumCompletionTokens
Gets or sets the maximum number of tokens permitted for the OCR transcription output.
- Model
Gets the language model instance used by this object.
- StripImageMarkup
Gets or sets a value indicating whether Markdown image references should be removed from the transcription output.
- StripStyleAttributes
Gets or sets a value indicating whether inline
styleattributes should be removed from HTML elements in the transcription output.
Methods
- GetSupportedIntents(LM)
Returns the intents that are known to produce dedicated results with the specified model.
- Run(Attachment, int, CancellationToken)
Runs OCR synchronously on the specified attachment and returns a detailed result.
- Run(ImageBuffer, CancellationToken)
Runs OCR synchronously on the specified image and returns a detailed result.
- RunAsync(Attachment, int, CancellationToken)
Runs OCR on the specified attachment and returns a detailed result.
- RunAsync(OcrParameters, CancellationToken)
Runs OCR on the specified page using the configured multimodal language model.
- RunAsync(ImageBuffer, CancellationToken)
Runs OCR on the specified image and returns a detailed result.