Class LMKitOcr

Namespace: LMKit.Extraction.Ocr

Assembly: LM-Kit.NET.dll

Provides high-throughput OCR functionality optimized for business documents, with advanced page layout analysis, automatic language and orientation detection, and automatic model download support. Implements IDisposable to release native OCR resources.

public class LMKitOcr : OcrEngine, IDisposable

Inheritance: object

OcrEngine

LMKitOcr

Implements: IDisposable

Inherited Members: OcrEngine.OcrStarting

OcrEngine.OcrCompleted

OcrEngine.RunAsync(Attachment, int, CancellationToken, IList<Language>, bool?)

OcrEngine.OnOcrStarting(OcrStartingEventArgs)

OcrEngine.OnOcrCompleted(OcrCompletedEventArgs)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Examples

using var ocr = new LMKitOcr();

// Optional: enable automatic language detection
ocr.VisionModel = myVisionModel;
ocr.EnableLanguageDetection = true;

var result = await ocr.RunAsync(ocrParameters, cancellationToken);

Remarks

LM-Kit OCR is engineered for speed, accuracy, and complex page layout handling. It delivers very high accuracy on business documents such as invoices, contracts, reports, and forms, while maintaining high throughput for batch processing scenarios.

Key capabilities:

High-throughput processing optimized for large-scale document workflows
Very high accuracy on business documents (invoices, contracts, reports, forms)
Complex page layout handling with intelligent reading order reconstruction
Automatic language detection (requires a vision-capable LM model)
Automatic page orientation detection and correction
Automatic deskewing of scanned documents
On-demand downloading of OCR dictionaries from Hugging Face

Constructors

LMKitOcr(): Initializes a new instance of the LMKitOcr class using the default model storage directory.

LMKitOcr(string): Initializes a new instance of the LMKitOcr class with the specified OCR resource path.

Properties

ActiveProcessCount: Gets the number of OCR operations currently executing across all LMKitOcr instances.

CachedEngineCount: Number of OCR engines currently sitting idle in the reuse cache. Read before ClearCache() to report how many engines a reclaim freed.

DefaultLanguage: Gets or sets the default ISO 639-2/T language code used when a specific language model is not available or language detection is disabled.

EnableAutoDeskew: Gets or sets a value indicating whether automatic deskewing is applied to the input image before OCR.

EnableDespeckle: Gets or sets a value indicating whether speckle noise removal is applied to the binarized image before OCR recognition.

EnableLanguageDetection: Gets or sets a value indicating whether automatic language detection is performed before OCR.

EnableModelDownload: Gets or sets a value indicating whether missing OCR dictionaries should be automatically downloaded.

EnableSmartBinarization: Gets or sets a value indicating whether adaptive (smart) binarization is used to convert the input image to a binary (black-and-white) representation.

MaxConcurrentProcesses: Gets or sets the maximum number of OCR operations allowed to run concurrently across all LMKitOcr instances.

MaxSupportedConcurrentProcesses: Gets the maximum value MaxConcurrentProcesses can take on the current host: the number of logical CPU processors (vCPUs), capped at 24. This is also the default value of MaxConcurrentProcesses and the ceiling to which any assigned value is clamped.

VisionModel: Gets or sets the vision-capable LM model used for automatic language detection.

Methods

ClearCache(): Clears all cached OCR engines and releases their associated resources.

Dispose(): Releases all resources used by this LMKitOcr instance.

RunAsync(OcrParameters, CancellationToken): Runs OCR on the provided image data asynchronously.

Events

LanguageDetected: Occurs when a language is detected during OCR processing.

OrientationDetected: Occurs when page orientation is detected during OCR processing.

Table of Contents