Table of Contents

Class TesseractOcr

Namespace
LMKit.Integrations.Tesseract
Assembly
LM-Kit.NET.dll

Provides OCR functionality using the Tesseract engine, with optional language and orientation detection, and automatic model download support. Implements IDisposable to release native Tesseract resources.

public sealed class TesseractOcr : OcrEngine, IDisposable
Inheritance
TesseractOcr
Implements
Inherited Members

Examples

using var ocr = new TesseractOcr();

// Optional: enable automatic language detection
ocr.VisionModel = myVisionModel;
ocr.EnableLanguageDetection = true;

var result = await ocr.RunAsync(ocrParameters, cancellationToken);

Remarks

This class wraps the Tesseract OCR engine and provides additional features such as:

  • Automatic language detection (requires a vision-capable LM model)
  • Automatic page orientation detection and correction
  • Automatic deskewing of scanned documents
  • On-demand downloading of Tesseract traineddata files from Hugging Face

Constructors

TesseractOcr()

Initializes a new instance of the TesseractOcr class using the default model storage directory.

TesseractOcr(string)

Initializes a new instance of the TesseractOcr class with the specified Tesseract resource path.

Properties

DefaultLanguage

Gets or sets the default ISO 639-2/T language code used when a specific language model is not available or language detection is disabled.

EnableAutoDeskew

Gets or sets a value indicating whether automatic deskewing is applied to the input image before OCR.

EnableLanguageDetection

Gets or sets a value indicating whether automatic language detection is performed before OCR.

EnableModelDownload

Gets or sets a value indicating whether missing Tesseract traineddata files should be automatically downloaded.

EnableOrientationDetection

Gets or sets a value indicating whether automatic orientation detection is performed before OCR.

VisionModel

Gets or sets the vision-capable LM model used for automatic language detection.

Methods

Dispose()

Releases all resources used by this TesseractOcr instance.

RunAsync(OcrParameters, CancellationToken)

Runs OCR on the provided image data asynchronously.

Events

LanguageDetected

Occurs when a language is detected during OCR processing.

OrientationDetected

Occurs when page orientation is detected during OCR processing.