Table of Contents

Class PdfSearchableMaker

Namespace
LMKit.Document.Pdf
Assembly
LM-Kit.NET.dll

Makes an existing PDF searchable by adding invisible text overlays to pages that lack embedded text. The original document structure, images, annotations, and metadata are fully preserved. The output always contains the same number of pages as the input; only the pages targeted by PageRange are OCRed.

public static class PdfSearchableMaker
Inheritance
PdfSearchableMaker
Inherited Members

Examples

Example: Make a scanned PDF searchable using LM-Kit OCR.

using LMKit.Document.Pdf;
using LMKit.Extraction.Ocr;

var ocr = new LMKitOcr();

// Simple usage with defaults PdfSearchableMaker.ConvertToFile("scanned.pdf", ocr, "searchable.pdf");

// Advanced usage with per-call progress var options = new PdfSearchableMakerOptions { MaxDegreeOfParallelism = 4, Progress = new Progress<OcrProgressEventArgs>(e => Console.WriteLine($"[{(double)(e.PageIndex+1)/e.TotalPages:P0}] Page {e.PageIndex+1}")) };

PdfSearchableMaker.ConvertToFile("scanned.pdf", ocr, "searchable.pdf", options);

Remarks

Unlike ImageToSearchablePdf, which converts images to new PDFs, PdfSearchableMaker operates on existing PDF documents. It opens the source PDF, runs OCR on pages that need it, inserts invisible text objects into those pages, and saves the result. Pages that already contain text can optionally be skipped or re-OCRed, controlled by PdfSearchableMakerOptions.

Progress can be reported per-call via Progress (recommended for concurrent scenarios) or globally via the static Progress event.

Methods

AddTextOverlay(Attachment, int, PageElement, PdfSaveOptions)

Adds an invisible text overlay to a single page of a PDF attachment using a precomputed PageElement. No OCR is performed.

AddTextOverlayToFile(string, int, PageElement, string, PdfSaveOptions)

Adds an invisible text overlay to a single page of a PDF file using a precomputed PageElement. No OCR is performed.

AddTextOverlays(Attachment, IEnumerable<PageTextOverlay>, PdfSaveOptions)

Adds invisible text overlays to multiple pages of a PDF attachment in a single operation using precomputed PageElement instances. No OCR is performed.

AddTextOverlaysToFile(string, IEnumerable<PageTextOverlay>, string, PdfSaveOptions)

Adds invisible text overlays to multiple pages of a PDF file in a single operation using precomputed PageElement instances. No OCR is performed.

Convert(Attachment, OcrEngine, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF attachment searchable synchronously by adding invisible text overlays.

Convert(string, OcrEngine, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF file searchable synchronously and returns the result as an attachment.

ConvertAsync(Attachment, OcrEngine, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF attachment searchable asynchronously by adding invisible text overlays. The output contains the same number of pages as the input.

ConvertAsync(string, OcrEngine, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF file searchable asynchronously and returns the result as an attachment.

ConvertDirectory(string, OcrEngine, string, string, bool, PdfSearchableMakerOptions, CancellationToken)

Makes all PDF files in a directory searchable, writing results to the specified output directory.

ConvertDirectoryAsync(string, OcrEngine, string, string, bool, PdfSearchableMakerOptions, CancellationToken)

Makes all PDF files in a directory searchable, writing results to the specified output directory.

ConvertFiles(IEnumerable<string>, OcrEngine, string, PdfSearchableMakerOptions, CancellationToken)

Makes multiple PDF files searchable in sequence, writing each result to the specified output directory. Output files retain the original file names.

ConvertFilesAsync(IEnumerable<string>, OcrEngine, string, PdfSearchableMakerOptions, CancellationToken)

Makes multiple PDF files searchable in sequence, writing each result to the specified output directory. Output files retain the original file names.

ConvertToFile(Attachment, OcrEngine, string, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF attachment searchable and writes the result to a file synchronously.

ConvertToFile(string, OcrEngine, string, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF file searchable and writes the result to a new file synchronously.

ConvertToFileAsync(Attachment, OcrEngine, string, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF attachment searchable and writes the result to a file asynchronously.

ConvertToFileAsync(string, OcrEngine, string, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF file searchable and writes the result to a new file asynchronously.

ConvertToStreamAsync(Attachment, OcrEngine, Stream, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF attachment searchable asynchronously and writes the result to a stream.

ConvertToStreamAsync(string, OcrEngine, Stream, PdfSearchableMakerOptions, CancellationToken)

Makes a PDF file searchable asynchronously and writes the result to a stream.

Events

Progress

Raised after each page is processed (OCRed or skipped) during a conversion operation. For concurrent or multi-tenant scenarios, prefer using Progress instead of this static event.

Share