Table of Contents

Method Highlight

Namespace
LMKit.Document.Search
Assembly
LM-Kit.NET.dll

Highlight(string, string, SearchHighlightOptions, CancellationToken)

Searches text in a document file and returns a highlighted copy.

public static SearchHighlightResult Highlight(string inputPath, string query, SearchHighlightOptions options = null, CancellationToken cancellationToken = default)

Parameters

inputPath string

Path to the input document (PDF or image).

query string

The search text or pattern. For Text, an exact substring. For Regex, a .NET regular expression pattern. For Fuzzy, the approximate text to locate.

options SearchHighlightOptions

Search and highlight options. When null, default options are used (text mode, case-insensitive, semi-transparent yellow highlight).

cancellationToken CancellationToken

Cancellation token.

Returns

SearchHighlightResult

A SearchHighlightResult containing the highlighted document and match metadata.

Examples

using LMKit.Document.Search;

// Search for all occurrences of "VAT" in a PDF invoice.
SearchHighlightResult result = SearchHighlightEngine.Highlight(
    "invoice.pdf",
    "VAT");

Console.WriteLine($"Found {result.TotalMatches} match(es) across {result.ScannedPages} page(s).");

// Save the highlighted PDF.
File.WriteAllBytes("invoice_highlighted.pdf", result.OutputData);

// Print each match with its page number and surrounding context.
foreach (TextMatch match in result.Matches)
{
    Console.WriteLine($"  Page {match.PageIndex + 1}: \"{match.Text}\"");
    if (!string.IsNullOrEmpty(match.Snippet))
    {
        Console.WriteLine($"    ...{match.Snippet}...");
    }
}

Exceptions

ArgumentNullException

inputPath is null.

FileNotFoundException

The specified file does not exist.

ArgumentException

query is null or whitespace.

Highlight(Attachment, string, SearchHighlightOptions, IReadOnlyList<PageElement>, CancellationToken)

Searches text in an Attachment and returns a highlighted copy.

public static SearchHighlightResult Highlight(Attachment attachment, string query, SearchHighlightOptions options = null, IReadOnlyList<PageElement> pageElements = null, CancellationToken cancellationToken = default)

Parameters

attachment Attachment

The source document (PDF or image). Must not be null.

query string

The search text or pattern. For Text, an exact substring. For Regex, a .NET regular expression pattern. For Fuzzy, the approximate text to locate.

options SearchHighlightOptions

Search and highlight options. When null, default options are used.

pageElements IReadOnlyList<PageElement>

Optional pre-computed page elements indexed by zero-based page index. When a non-null entry exists for a given page, it is used for search instead of the attachment's internal text extraction. This enables highlighting on raster PDFs or images whose text was obtained via prior OCR or layout analysis.

cancellationToken CancellationToken

Cancellation token.

Returns

SearchHighlightResult

A SearchHighlightResult containing the highlighted document and match metadata.

Examples

Search a PDF loaded as an Attachment:

using LMKit.Data;
using LMKit.Document.Search;

using var attachment = new Attachment("report.pdf");

SearchHighlightResult result = SearchHighlightEngine.Highlight( attachment, "revenue");

Console.WriteLine($"Matches: {result.TotalMatches}"); File.WriteAllBytes("report_highlighted.pdf", result.OutputData);

Search with OCR-extracted page elements for a scanned PDF:

using LMKit.Data;
using LMKit.Document.Layout;
using LMKit.Document.Search;
using LMKit.Integrations.Tesseract;

using var attachment = new Attachment("scanned_invoice.pdf");

// Run OCR on each page to extract text with coordinates. using var ocr = new TesseractOcr(); var pageElements = new PageElement[attachment.PageCount];

for (int i = 0; i < attachment.PageCount; i++) { OcrResult ocrResult = ocr.RunAsync(attachment, i).GetAwaiter().GetResult(); pageElements[i] = ocrResult.PageElement; }

// Search and highlight using the OCR results. SearchHighlightResult result = SearchHighlightEngine.Highlight( attachment, "total", pageElements: pageElements);

File.WriteAllBytes("scanned_invoice_highlighted.pdf", result.OutputData); Console.WriteLine($"Found {result.TotalMatches} match(es).");

Exceptions

ArgumentNullException

attachment is null.

ArgumentException

query is null or whitespace.

Share