Enum TextOutputMode

Namespace: LMKit.Document.Layout

Assembly: LM-Kit.NET.dll

Controls how extracted text is aggregated and formatted when exported as plain text.

public enum TextOutputMode

Fields

RawLines = 0: Output one line per detected text line with no layout analysis.
GridAligned = 1: Preserve approximate column alignment and indentation (grid-style spacing).
ParagraphFlow = 2: Group lines into paragraphs ordered for reading; insert blank lines between paragraphs.
Structured = 3: Preserve both paragraph flow and tabular structure; optimized for semantic extraction.
Auto = 4: Automatically evaluate page structure to choose the optimal formatting strategy.

Examples

Example: Extract text using different output modes.

using LMKit.Document.Layout;
using LMKit.Document.Pdf;
PdfInfo info = PdfInfo.Load("document.pdf");
PageElement page = info.Pages[0].GetLayout();
// Raw lines: one line per detected line, no formatting.
string raw = page.GetText(TextOutputMode.RawLines);
// Grid-aligned: preserves column indentation.
string grid = page.GetText(TextOutputMode.GridAligned);
// Paragraph flow: groups lines into readable paragraphs.
string flow = page.GetText(TextOutputMode.ParagraphFlow);
// Structured: best for RAG pipelines and semantic chunking.
string structured = page.GetText(TextOutputMode.Structured);
// Auto: let the engine pick the best mode for the page.
string auto = page.GetText(TextOutputMode.Auto);

Remarks

RawLines – one logical line per detected line; no grid/column analysis. Words are joined with single spaces; indentation and column alignment are not preserved.
GridAligned – preserves approximate columns/indentation by inserting spaces based on word positions within the page bounds; adds 0–5 blank lines based on measured inter-line spacing.
ParagraphFlow – groups lines into paragraphs in reading order and separates paragraphs with a blank line; best for natural reading.
Structured – maintains paragraph boundaries and tabular layouts as logical blocks; ideal for RAG pipelines where semantic chunking and context preservation are critical.
Auto – inspects document structure and selects the most suitable mode depending on layout characteristics and intended use case.

All modes operate in a normalized "view space" (deskewed, de-rotated) for analysis, then return text in plain UTF-8 with Unix line endings; trailing whitespace is trimmed.

Table of Contents

Enum TextOutputMode

Fields

Examples

Remarks