Class SearchHighlightEngine
Searches text in a paginated document and produces a highlighted copy. For PDF input, adds highlight annotations and saves incrementally for maximum performance. For image input, draws semi-transparent rectangles over matches and outputs PNG.
public static class SearchHighlightEngine
- Inheritance
-
SearchHighlightEngine
- Inherited Members
Examples
Search a PDF for a keyword and save the highlighted copy:
using LMKit.Document.Search;
// Search for "total" in a PDF and produce a highlighted copy.
SearchHighlightResult result = SearchHighlightEngine.Highlight(
"invoice.pdf",
"total");
Console.WriteLine($"Matches found: {result.TotalMatches}");
Console.WriteLine($"Pages scanned: {result.ScannedPages}/{result.PageCount}");
// Save the highlighted output next to the original file.
File.WriteAllBytes("invoice_highlighted.pdf", result.OutputData);
// Print the first 10 matches.
foreach (TextMatch match in result.Matches.Take(10))
{
Console.WriteLine($" Page {match.PageIndex + 1}: "{match.Text}"");
}
Use regex search mode to find all monetary amounts:
using LMKit.Document.Search;
var options = new SearchHighlightOptions
{
SearchMode = SearchMode.Regex
};
SearchHighlightResult result = SearchHighlightEngine.Highlight(
"financial_report.pdf",
@"\$[\d,]+\.\d{2}",
options);
Console.WriteLine($"Found {result.TotalMatches} monetary values.");
File.WriteAllBytes("financial_report_highlighted.pdf", result.OutputData);
Use fuzzy search to find approximate matches:
using LMKit.Document.Search;
var options = new SearchHighlightOptions
{
SearchMode = SearchMode.Fuzzy,
MaxEditDistance = 2
};
SearchHighlightResult result = SearchHighlightEngine.Highlight(
"scanned_contract.pdf",
"indemnification",
options);
// Fuzzy search catches OCR artifacts and minor typos.
foreach (TextMatch match in result.Matches)
{
Console.WriteLine($" Page {match.PageIndex + 1}: "{match.Text}" (score: {match.Score:F2})");
}
File.WriteAllBytes("scanned_contract_highlighted.pdf", result.OutputData);
Remarks
Supports three search modes via SearchMode: exact text, regular expression, and fuzzy (edit-distance) matching, all backed by LayoutSearchEngine.
When pre-computed PageElement instances are supplied (for example, from a prior OCR or layout analysis pass), they are used for search instead of the document's native text extraction. This enables highlighting on raster PDFs or images whose text was obtained externally.
Methods
- Highlight(Attachment, string, SearchHighlightOptions, IReadOnlyList<PageElement>, CancellationToken)
Searches text in an Attachment and returns a highlighted copy.
- Highlight(string, string, SearchHighlightOptions, CancellationToken)
Searches text in a document file and returns a highlighted copy.
- HighlightAsync(Attachment, string, SearchHighlightOptions, IReadOnlyList<PageElement>, CancellationToken)
Asynchronously searches text in an Attachment and returns a highlighted copy.
- HighlightAsync(string, string, SearchHighlightOptions, CancellationToken)
Asynchronously searches text in a document file and returns a highlighted copy.