Enum TextOutputMode
Controls how extracted text is aggregated and formatted when exported as plain text.
public enum TextOutputMode
Fields
RawLines = 0Output one line per detected text line with no layout analysis.
GridAligned = 1Preserve approximate column alignment and indentation (grid-style spacing).
ParagraphFlow = 2Group lines into paragraphs ordered for reading; insert blank lines between paragraphs.
Structured = 3Preserve both paragraph flow and tabular structure; optimized for semantic extraction.
Auto = 4Automatically evaluate page structure to choose the optimal formatting strategy.
Remarks
- RawLines – one logical line per detected line; no grid/column analysis. Words are joined with single spaces; indentation and column alignment are not preserved.
- GridAligned – preserves approximate columns/indentation by inserting spaces based on word positions within the page bounds; adds 0–5 blank lines based on measured inter-line spacing.
- ParagraphFlow – groups lines into paragraphs in reading order and separates paragraphs with a blank line; best for natural reading.
- Structured – maintains paragraph boundaries and tabular layouts as logical blocks; ideal for RAG pipelines where semantic chunking and context preservation are critical.
- Auto – inspects document structure and selects the most suitable mode depending on layout characteristics and intended use case.