Class TesseractOcr
- Namespace
- LMKit.Integrations.Tesseract
- Assembly
- LM-Kit.NET.dll
Provides OCR functionality using the Tesseract engine, with optional language and orientation detection, and automatic model download support. Implements IDisposable to release native Tesseract resources.
public sealed class TesseractOcr : OcrEngine, IDisposable
- Inheritance
-
TesseractOcr
- Implements
- Inherited Members
Examples
using var ocr = new TesseractOcr();
// Optional: enable automatic language detection
ocr.VisionModel = myVisionModel;
ocr.EnableLanguageDetection = true;
var result = await ocr.RunAsync(ocrParameters, cancellationToken);
Remarks
This class wraps the Tesseract OCR engine and provides additional features such as:
- Automatic language detection (requires a vision-capable LM model)
- Automatic page orientation detection and correction
- Automatic deskewing of scanned documents
- On-demand downloading of Tesseract traineddata files from Hugging Face
Constructors
- TesseractOcr()
Initializes a new instance of the TesseractOcr class using the default model storage directory.
- TesseractOcr(string)
Initializes a new instance of the TesseractOcr class with the specified Tesseract resource path.
Properties
- DefaultLanguage
Gets or sets the default ISO 639-2/T language code used when a specific language model is not available or language detection is disabled.
- EnableAutoDeskew
Gets or sets a value indicating whether automatic deskewing is applied to the input image before OCR.
- EnableLanguageDetection
Gets or sets a value indicating whether automatic language detection is performed before OCR.
- EnableModelDownload
Gets or sets a value indicating whether missing Tesseract traineddata files should be automatically downloaded.
- EnableOrientationDetection
Gets or sets a value indicating whether automatic orientation detection is performed before OCR.
- VisionModel
Gets or sets the vision-capable LM model used for automatic language detection.
Methods
- Dispose()
Releases all resources used by this TesseractOcr instance.
- RunAsync(OcrParameters, CancellationToken)
Runs OCR on the provided image data asynchronously.
Events
- LanguageDetected
Occurs when a language is detected during OCR processing.
- OrientationDetected
Occurs when page orientation is detected during OCR processing.