Table of Contents

Method RunAsync

Namespace
LMKit.Extraction.Ocr
Assembly
LM-Kit.NET.dll

RunAsync(OcrParameters, CancellationToken)

Executes the OCR process using the provided parameters. Concrete subclasses must override this method to implement specific OCR logic (e.g., calling a third‐party OCR library).

public abstract Task<OcrResult> RunAsync(OcrParameters ocrParameters, CancellationToken cancellationToken = default)

Parameters

ocrParameters OcrParameters

An OcrParameters instance that encapsulates the image buffer, any associated attachment metadata, and any additional configuration options.

cancellationToken CancellationToken

A CancellationToken that can be used to cancel the OCR operation at any time.

Returns

Task<OcrResult>

A Task<TResult> that, when completed, provides an OcrResult containing the extracted text, layout information, and any other data produced by the OCR engine.

Examples

public class TesseractOcrEngine : OcrEngine
{
    public override async Task<OcrResult> RunAsync(
        OcrParameters ocrParameters,
        CancellationToken cancellationToken = default)
    {
        string text = await Tesseract.RecognizeAsync(ocrParameters.ImageData, cancellationToken);
        return new OcrResult(text);
    }
}

Exceptions

OperationCanceledException

Thrown if the operation is canceled via the provided cancellationToken.

Exception

Concrete implementations may throw other exceptions to indicate failures in the underlying OCR processing (e.g., I/O errors, service faults, invalid image format). It is recommended to document those specifics in the subclass’s implementation.

RunAsync(Attachment, int, CancellationToken)

Runs OCR on a specific page of the given attachment. This convenience overload handles image extraction from the attachment internally, then delegates to RunOcrAsync(ImageBuffer, Attachment, int, CancellationToken).

public Task<OcrResult> RunAsync(Attachment attachment, int pageIndex, CancellationToken cancellationToken = default)

Parameters

attachment Attachment

The Attachment containing the document or image to process (e.g., a PDF, TIFF, or single-page image file).

pageIndex int

The zero-based index of the page within the attachment to run OCR on. For single-page images, use 0.

cancellationToken CancellationToken

A CancellationToken that can be used to cancel the OCR operation.

Returns

Task<OcrResult>

A Task<TResult> that, when completed, provides an OcrResult containing the extracted text and layout information for the specified page.

Examples

Example: Run OCR on the first page of a PDF attachment

var attachment = new Attachment("invoice.pdf");
var engine = new MyOcrEngine();

OcrResult result = await engine.RunAsync(attachment, pageIndex: 0); Console.WriteLine(result.PageText);

Exceptions

OperationCanceledException

Thrown if the operation is canceled via cancellationToken or by an OcrStarting event subscriber.

Share