Class LayoutSearchOptions
Global options for LayoutSearchEngine controlling normalization and coordinate handling.
public sealed class LayoutSearchOptions
- Inheritance
-
LayoutSearchOptions
- Inherited Members
Examples
Example 1: Default normalization (whitespace collapsing and diacritics removal).
using LMKit.Document.Search;
// Default: NormalizeWhitespace = true, IgnoreDiacritics = true.
var options = new LayoutSearchOptions();
var engine = new LayoutSearchEngine(options);
// "café" and "cafe" will match; "hello world" matches "hello world".
Example 2: Aggressive normalization for OCR text with noise.
using LMKit.Document.Search;
var options = new LayoutSearchOptions
{
NormalizeWhitespace = true,
IgnoreDiacritics = true,
IgnorePunctuation = true,
IgnoreSymbols = true
};
var engine = new LayoutSearchEngine(options);
Example 3: Custom character filter via regex.
using System.Text.RegularExpressions;
using LMKit.Document.Search;
var options = new LayoutSearchOptions
{
IgnoreCharactersRegex = new Regex(@"[^\p{L}\p{Nd}\s]", RegexOptions.Compiled)
};
var engine = new LayoutSearchEngine(options);
Properties
- IgnoreCharactersRegex
Optional compiled regular expression applied per-character to remove additional characters after diacritics/whitespace and the boolean filters. The pattern should be character-class-like (e.g.,
[\p{P}\p{S}]or[^\p{L}\p{Nd}]) for predictable results. Note: Per-character regex checks are slower than the boolean flags; use only when needed.
- IgnoreDiacritics
When true, removes diacritics (accents) prior to searching, improving robustness to encoding/OCR variance.
- IgnorePunctuation
When true, removes all Unicode punctuation characters prior to searching (e.g., (), - . , ; : ! ? quotes, etc.). Covers Dash/Open/Close/Other/Initial/Final/Connector punctuation categories.
- IgnoreSymbols
When true, removes all Unicode symbol characters prior to searching (Math/Currency/Modifier/Other symbols; useful to drop '+', currency marks, etc.).
- KeepOnlyLettersAndDigits
When true, keeps only letters and digits (everything else is stripped). This implies that whitespace, punctuation, and symbols are removed.
- NormalizeWhitespace
When true, collapses any sequence of whitespace into a single space prior to searching.