Method DetectParagraphs
DetectParagraphs()
Detects and groups lines into paragraphs via a joint layout-and-NLP analysis. Combines geometric cues (inter-line spacing ratios, indentation deltas, baseline alignment, left/right rag, column/region membership, style changes) with linguistic signals (sentence boundary confidence, discourse/continuation markers, list/quote/heading patterns, cross-line semantic cohesion). Designed to be robust to OCR noise, rotation/skew, and multilingual scripts.
public List<ParagraphElement> DetectParagraphs()
Returns
- List<ParagraphElement>
A list of ParagraphElement objects in reading order, positioned in the page’s original coordinate space.
Remarks
Paragraph detection operates in normalized “view space” (deskewed and de-rotated) to improve stability, then remaps the result back to the page’s original coordinate system before returning.
If no layout information is available, the method returns a single ParagraphElement that contains a single LineElement wrapping all text elements.
Results are cached internally and invalidated whenever page content or geometry changes. Each call returns a new list composed of new ParagraphElement and LineElement instances that reference the original TextElements.