Table of Contents

Method DetectParagraphs

Namespace
LMKit.Document.Layout
Assembly
LM-Kit.NET.dll

DetectParagraphs()

Detects and groups lines into paragraphs via a joint layout-and-NLP analysis. Combines geometric cues (inter-line spacing ratios, indentation deltas, baseline alignment, left/right rag, column/region membership, style changes) with linguistic signals (sentence boundary confidence, discourse/continuation markers, list/quote/heading patterns, cross-line semantic cohesion). Designed to be robust to OCR noise, rotation/skew, and multilingual scripts.

public List<ParagraphElement> DetectParagraphs()

Returns

List<ParagraphElement>

A list of ParagraphElement objects in reading order, positioned in the page’s original coordinate space.

Remarks

Paragraph detection operates in normalized “view space” (deskewed and de-rotated) to improve stability, then remaps the result back to the page’s original coordinate system before returning.

If no layout information is available, the method returns a single ParagraphElement that contains a single LineElement wrapping all text elements.

Results are cached internally and invalidated whenever page content or geometry changes. Each call returns a new list composed of new ParagraphElement and LineElement instances that reference the original TextElements.