Class LMKitOcr
- Namespace
- LMKit.Extraction.Ocr
- Assembly
- LM-Kit.NET.dll
Provides high-throughput OCR functionality optimized for business documents, with advanced page layout analysis, automatic language and orientation detection, and automatic model download support. Implements IDisposable to release native OCR resources.
public class LMKitOcr : OcrEngine, IDisposable
- Inheritance
-
LMKitOcr
- Implements
- Inherited Members
Examples
using var ocr = new LMKitOcr();
// Optional: enable automatic language detection
ocr.VisionModel = myVisionModel;
ocr.EnableLanguageDetection = true;
var result = await ocr.RunAsync(ocrParameters, cancellationToken);
Remarks
LM-Kit OCR is engineered for speed, accuracy, and complex page layout handling. It delivers very high accuracy on business documents such as invoices, contracts, reports, and forms, while maintaining high throughput for batch processing scenarios.
Key capabilities:
- High-throughput processing optimized for large-scale document workflows
- Very high accuracy on business documents (invoices, contracts, reports, forms)
- Complex page layout handling with intelligent reading order reconstruction
- Automatic language detection (requires a vision-capable LM model)
- Automatic page orientation detection and correction
- Automatic deskewing of scanned documents
- On-demand downloading of OCR dictionaries from Hugging Face
Constructors
- LMKitOcr()
Initializes a new instance of the LMKitOcr class using the default model storage directory.
- LMKitOcr(string)
Initializes a new instance of the LMKitOcr class with the specified OCR resource path.
Properties
- DefaultLanguage
Gets or sets the default ISO 639-2/T language code used when a specific language model is not available or language detection is disabled.
- EnableAutoDeskew
Gets or sets a value indicating whether automatic deskewing is applied to the input image before OCR.
- EnableDespeckle
Gets or sets a value indicating whether speckle noise removal is applied to the binarized image before OCR recognition.
- EnableLanguageDetection
Gets or sets a value indicating whether automatic language detection is performed before OCR.
- EnableModelDownload
Gets or sets a value indicating whether missing OCR dictionaries should be automatically downloaded.
- EnableOrientationDetection
Gets or sets a value indicating whether automatic orientation detection is performed before OCR.
- EnableSmartBinarization
Gets or sets a value indicating whether adaptive (smart) binarization is used to convert the input image to a binary (black-and-white) representation.
- VisionModel
Gets or sets the vision-capable LM model used for automatic language detection.
Methods
- ClearCache()
Clears all cached OCR engines and releases their associated resources.
- RunAsync(OcrParameters, CancellationToken)
Runs OCR on the provided image data asynchronously.
Events
- LanguageDetected
Occurs when a language is detected during OCR processing.
- OrientationDetected
Occurs when page orientation is detected during OCR processing.