Table of Contents

Class SearchHighlightEngine

Namespace
LMKit.Document.Search
Assembly
LM-Kit.NET.dll

Searches text in a paginated document and produces a highlighted copy. For PDF input, adds highlight annotations and saves incrementally for maximum performance. For image input, draws semi-transparent rectangles over matches and outputs PNG.

public static class SearchHighlightEngine
Inheritance
SearchHighlightEngine
Inherited Members

Examples

Search a PDF for a keyword and save the highlighted copy:

using LMKit.Document.Search;

// Search for "total" in a PDF and produce a highlighted copy. SearchHighlightResult result = SearchHighlightEngine.Highlight( "invoice.pdf", "total");

Console.WriteLine($"Matches found: {result.TotalMatches}"); Console.WriteLine($"Pages scanned: {result.ScannedPages}/{result.PageCount}");

// Save the highlighted output next to the original file. File.WriteAllBytes("invoice_highlighted.pdf", result.OutputData);

// Print the first 10 matches. foreach (TextMatch match in result.Matches.Take(10)) { Console.WriteLine($" Page {match.PageIndex + 1}: "{match.Text}""); }

Use regex search mode to find all monetary amounts:

using LMKit.Document.Search;

var options = new SearchHighlightOptions { SearchMode = SearchMode.Regex };

SearchHighlightResult result = SearchHighlightEngine.Highlight( "financial_report.pdf", @"\$[\d,]+\.\d{2}", options);

Console.WriteLine($"Found {result.TotalMatches} monetary values."); File.WriteAllBytes("financial_report_highlighted.pdf", result.OutputData);

Use fuzzy search to find approximate matches:

using LMKit.Document.Search;

var options = new SearchHighlightOptions { SearchMode = SearchMode.Fuzzy, MaxEditDistance = 2 };

SearchHighlightResult result = SearchHighlightEngine.Highlight( "scanned_contract.pdf", "indemnification", options);

// Fuzzy search catches OCR artifacts and minor typos. foreach (TextMatch match in result.Matches) { Console.WriteLine($" Page {match.PageIndex + 1}: "{match.Text}" (score: {match.Score:F2})"); }

File.WriteAllBytes("scanned_contract_highlighted.pdf", result.OutputData);

Remarks

Supports three search modes via SearchMode: exact text, regular expression, and fuzzy (edit-distance) matching, all backed by LayoutSearchEngine.

When pre-computed PageElement instances are supplied (for example, from a prior OCR or layout analysis pass), they are used for search instead of the document's native text extraction. This enables highlighting on raster PDFs or images whose text was obtained externally.

Methods

Highlight(Attachment, string, SearchHighlightOptions, IReadOnlyList<PageElement>, CancellationToken)

Searches text in an Attachment and returns a highlighted copy.

Highlight(string, string, SearchHighlightOptions, CancellationToken)

Searches text in a document file and returns a highlighted copy.

HighlightAsync(Attachment, string, SearchHighlightOptions, IReadOnlyList<PageElement>, CancellationToken)

Asynchronously searches text in an Attachment and returns a highlighted copy.

HighlightAsync(string, string, SearchHighlightOptions, CancellationToken)

Asynchronously searches text in a document file and returns a highlighted copy.

Share