👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document-intelligence/pdf-toolkit/pdf_text_search_with_highlights
PDF Text Search with Highlights for C# .NET Applications
🎯 Purpose of the Demo
An interactive console app that runs layout-aware keyword search across PDFs. Every match comes back with page index, snippet, and bounding-box coordinates suitable for drawing highlight rectangles in a reviewer UI. Two modes: single-PDF REPL (run many queries against one file) or folder-wide scan with CSV export.
All processing runs on-device.
👥 Industry Target Audience
- Legal review: find every reference to a clause, party, or amount across a contract archive.
- Compliance: audit policy documents for forbidden or required terms.
- Reviewer UIs: power "find in document" with highlight rectangles, not just text.
- eDiscovery: produce CSV evidence of where a term appears, with page and bounds.
- Academic / research: search a corpus of papers by concept.
🚀 Problem Solved
Plain text-search returns hits but not positions. Highlighting in a reviewer UI requires both the snippet AND the rectangle. PdfSearch.FindTextAsync solves it in one call; this demo wraps it behind a menu that handles two common usage patterns: REPL over a single PDF and one-shot folder-wide scan with CSV evidence.
💻 Application Overview
Interactive menu (no command-line arguments) with two modes:
| Mode | What it does |
|---|---|
| Search | Prompts for a PDF and password. Prompts once for search options (whole-word, case, max). Then loops: enter query → see hits → optionally export to CSV → enter next query. Blank query returns to menu. |
| Folder | Prompts for a folder, password, search options, query, and optional CSV path. Scans every PDF, writes one combined source,page,top,left,bottom,right,snippet CSV. |
| Quit | Exit. |
Both modes use PdfSearch.FindTextAsync. Match coordinates use the same coordinate system as the rendered page, so they can be drawn directly on a page raster.
✨ Key Features
PdfSearch.FindTextAsync(...)with optionalpageRangeandpassword.TextSearchOptions:Comparison,WholeWord,MaxResults,ContextChars.TextMatch.Bounds: bounding box ready for highlight rendering.- CSV evidence: portable audit trail per file or per folder.
🧠 Model
- None. This demo is pure document plumbing and does not load an LLM.
🛠️ Getting Started
📋 Prerequisites
- .NET 8.0 or later
▶️ Running the Application
git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document-intelligence/pdf-toolkit/pdf_text_search_with_highlights
dotnet run
Pick a mode from the menu and follow the prompts.