Table of Contents

👉 Try the demo: https://github.com/LM-Kit/lm-kit-net-samples/tree/main/console_net/document-intelligence/pdf-toolkit/pdf_text_search_with_highlights

PDF Text Search with Highlights for C# .NET Applications


🎯 Purpose of the Demo

An interactive console app that runs layout-aware keyword search across PDFs. Every match comes back with page index, snippet, and bounding-box coordinates suitable for drawing highlight rectangles in a reviewer UI. Two modes: single-PDF REPL (run many queries against one file) or folder-wide scan with CSV export.

All processing runs on-device.


👥 Industry Target Audience

  • Legal review: find every reference to a clause, party, or amount across a contract archive.
  • Compliance: audit policy documents for forbidden or required terms.
  • Reviewer UIs: power "find in document" with highlight rectangles, not just text.
  • eDiscovery: produce CSV evidence of where a term appears, with page and bounds.
  • Academic / research: search a corpus of papers by concept.

🚀 Problem Solved

Plain text-search returns hits but not positions. Highlighting in a reviewer UI requires both the snippet AND the rectangle. PdfSearch.FindTextAsync solves it in one call; this demo wraps it behind a menu that handles two common usage patterns: REPL over a single PDF and one-shot folder-wide scan with CSV evidence.


💻 Application Overview

Interactive menu (no command-line arguments) with two modes:

Mode What it does
Search Prompts for a PDF and password. Prompts once for search options (whole-word, case, max). Then loops: enter query → see hits → optionally export to CSV → enter next query. Blank query returns to menu.
Folder Prompts for a folder, password, search options, query, and optional CSV path. Scans every PDF, writes one combined source,page,top,left,bottom,right,snippet CSV.
Quit Exit.

Both modes use PdfSearch.FindTextAsync. Match coordinates use the same coordinate system as the rendered page, so they can be drawn directly on a page raster.

✨ Key Features

  • PdfSearch.FindTextAsync(...) with optional pageRange and password.
  • TextSearchOptions: Comparison, WholeWord, MaxResults, ContextChars.
  • TextMatch.Bounds: bounding box ready for highlight rendering.
  • CSV evidence: portable audit trail per file or per folder.

🧠 Model

  • None. This demo is pure document plumbing and does not load an LLM.

🛠️ Getting Started

📋 Prerequisites

  • .NET 8.0 or later

▶️ Running the Application

git clone https://github.com/LM-Kit/lm-kit-net-samples
cd lm-kit-net-samples/console_net/document-intelligence/pdf-toolkit/pdf_text_search_with_highlights
dotnet run

Pick a mode from the menu and follow the prompts.

Share