Table of Contents

Class LayoutSearchEngine

Namespace
LMKit.Document.Search
Assembly
LM-Kit.NET.dll

Provides advanced, layout-aware search capabilities over PageElement instances and their TextElement children. Supports exact, regex, fuzzy, region-based, proximity, block-level queries, and cross-page overloads. Returns bounding boxes and contributing elements for each match.

public sealed class LayoutSearchEngine
Inheritance
LayoutSearchEngine
Inherited Members

Examples

Example 1: Exact text search on a single page.

using LMKit.Document.Layout;
using LMKit.Document.Pdf;
using LMKit.Document.Search;

// Load a PDF and get the first page layout. PdfInfo info = PdfInfo.Load("invoice.pdf"); PageElement page = info.Pages[0].GetLayout();

var engine = new LayoutSearchEngine(); List<TextMatch> matches = engine.FindText(page, "Total Due");

foreach (TextMatch match in matches) { Console.WriteLine($"Found "{match.Text}" at {match.BoundingBox}"); }

Example 2: Regex search across all pages.

using LMKit.Document.Layout;
using LMKit.Document.Pdf;
using LMKit.Document.Search;
using System.Text.RegularExpressions;

PdfInfo info = PdfInfo.Load("report.pdf"); List<PageElement> pages = info.Pages .Select(p => p.GetLayout()) .ToList();

var engine = new LayoutSearchEngine(); var regexOpts = new RegexSearchOptions { RegexOptions = RegexOptions.IgnoreCase | RegexOptions.Compiled, MaxResults = 200 };

List<TextMatch> matches = engine.FindRegex(pages, @"\d{4}-\d{2}-\d{2}", regexOpts);

foreach (TextMatch match in matches) { Console.WriteLine($"Page {match.PageIndex + 1}: date "{match.Text}""); }

Example 3: Fuzzy search for an approximate term.

using LMKit.Document.Layout;
using LMKit.Document.Pdf;
using LMKit.Document.Search;

PdfInfo info = PdfInfo.Load("scanned_contract.pdf"); PageElement page = info.Pages[0].GetLayout();

var engine = new LayoutSearchEngine(); var fuzzyOpts = new FuzzySearchOptions { MaxEditDistance = 2, MinScore = 0.7 };

List<TextMatch> matches = engine.FindFuzzy(page, "indemnification", fuzzyOpts);

foreach (TextMatch match in matches) { Console.WriteLine($"Score {match.Score:F2}: "{match.Text}""); }

Example 4: Extract text between two anchors.

using LMKit.Document.Layout;
using LMKit.Document.Pdf;
using LMKit.Document.Search;

PdfInfo info = PdfInfo.Load("letter.pdf"); PageElement page = info.Pages[0].GetLayout();

var engine = new LayoutSearchEngine(); var betweenOpts = new BetweenOptions { Inclusive = false, MaxChars = 5000 };

List<TextMatch> spans = engine.FindBetween(page, "Dear", "Sincerely", betweenOpts);

foreach (TextMatch span in spans) { Console.WriteLine($"Letter body ({span.Text.Length} chars): {span.Text}"); }

Constructors

LayoutSearchEngine(LayoutSearchOptions)

Initializes a new instance of the LayoutSearchEngine class.

Methods

FindBetween(PageElement, string, string, BetweenOptions)

Extracts the text located between the first occurrence of startQuery and the first occurrence of endQuery. Can optionally include the anchors and cross line/block boundaries (within the same page).

FindBetween(IEnumerable<PageElement>, string, string, BetweenOptions)

Extracts text located between the first occurrences of startQuery and endQuery within each page, across multiple pages. This overload does not span across page boundaries.

FindFuzzy(PageElement, string, FuzzySearchOptions)

Performs token-aware fuzzy search using Damerau–Levenshtein distance over sliding windows of the page text. Useful when the source contains OCR noise or minor typos. Normalization (whitespace/diacritics/optional char-stripping) is applied to both the page text and the query.

FindFuzzy(IEnumerable<PageElement>, string, FuzzySearchOptions)

Performs fuzzy search across multiple pages.

FindInRegion(PageElement, Rectangle, RegionSearchOptions)

Returns text matches within a geometric region. You can choose intersection or containment semantics and whether to merge adjacent elements.

FindInRegion(IEnumerable<PageElement>, Rectangle, RegionSearchOptions)

Returns text matches found within the same region (in each page's coordinate space) across multiple pages. The same region rectangle is applied to each page independently (page-local coordinates).

FindNear(PageElement, string, ProximityOptions)

Finds instances of query located within a proximity of the specified anchor region.

FindNear(IEnumerable<PageElement>, string, ProximityOptions)

Finds instances of query located within a proximity of the specified anchor region across multiple pages. The same anchor region and radius are applied to each page independently (page-local coordinates).

FindRegex(PageElement, string, RegexSearchOptions)

Finds regular expression matches within a page's text and returns layout-aware results. The regex runs over the normalized page text (options are applied to the text, not the pattern).

FindRegex(IEnumerable<PageElement>, string, RegexSearchOptions)

Finds regular expression matches across multiple pages.

FindText(PageElement, string, TextSearchOptions)

Finds exact (substring) matches of query within a page's text, honoring textOptions. Results include the matched text, a context snippet, the union bounding box, and contributing elements. Normalization (whitespace/diacritics/optional char-stripping) is applied to both the page text and the query.

FindText(IEnumerable<PageElement>, string, TextSearchOptions)

Finds exact matches across multiple pages and annotates each result with its page index.

Share