Table of Contents

Class TextractOcr

Namespace
LMKit.Integrations.AWS.Ocr.Textract
Assembly
LM-Kit.NET.dll

An OCR engine implementation that leverages AWS Textract to extract text from image data.

public sealed class TextractOcr : OcrEngine
Inheritance
TextractOcr
Inherited Members

Remarks

This class extends OcrEngine and implements the necessary AWS Signature Version 4 signing process to call the Textract “DetectDocumentText” API. It converts input image bytes into the required JSON payload, signs the request, and parses out any recognized LINE‐level text blocks returned by Textract. The resulting text is returned inside an OcrResult instance.

To use this class, you must supply valid AWS credentials (access key ID and secret access key) and specify an AWSRegion. You can then assign an instance of TextractOcr to the OcrEngine property of your TextExtraction or invoke it directly with RunAsync(OcrParameters, CancellationToken).

Constructors

TextractOcr(string, string, AWSRegion)

Initializes a new instance of the TextractOcr class with the specified AWS credentials and region.

Properties

Timeout

Gets or sets the timeout duration for HTTP requests to the AWS Textract service.

Methods

RunAsync(OcrParameters, CancellationToken)

Executes the OCR process using the provided parameters. Concrete subclasses must override this method to implement specific OCR logic (e.g., calling a third‐party OCR library).