Method GetText
GetText(CancellationToken)
Extracts and returns the textual content from the attachment.
public string GetText(CancellationToken cancellationToken = default)
Parameters
cancellationTokenCancellationTokenA token to monitor for cancellation requests. Default: None.
Returns
- string
The textual content; empty string if no text is available.
GetText(string, CancellationToken)
Extracts and returns the textual content from the specified pages of the attachment.
public string GetText(string pageRange, CancellationToken cancellationToken = default)
Parameters
pageRangestringA page range specification using 1-based page numbers (e.g.,
"1-5, 7, 9-12"). Usenull, empty string, or"*"to include all pages. Invalid page numbers are silently ignored.cancellationTokenCancellationTokenA token to monitor for cancellation requests. Default: None.
Returns
- string
The extracted plain-text content from the specified pages; an empty string if no text is available or if the page range resolves to no valid pages.
Remarks
Page numbers in the range are 1-based (first page is 1). Ranges can be specified as:
"3"- single page"1-5"- page range (inclusive)"1-3, 7, 10-12"- multiple ranges and individual pages"5-1"- reversed ranges are normalized automatically
GetText(TextOutputMode, CancellationToken)
Extracts and returns the textual content using the specified layout aggregation mode.
public string GetText(TextOutputMode mode, CancellationToken cancellationToken = default)
Parameters
modeTextOutputModeControls how raw lines are grouped and spaced in the output. See TextOutputMode: RawLines (one line per detection), GridAligned (approximate columns/indentation), ParagraphFlow (paragraph grouping), Structured (paragraph and tabular preservation).
cancellationTokenCancellationTokenA token to observe while performing extraction. If cancellation is requested before extraction completes, an OperationCanceledException is thrown.
Returns
- string
The extracted plain-text content formatted according to
mode; an empty string if no textual content is available (e.g., images without OCR or unsupported formats).
Remarks
On first call, this method may parse the underlying data and cache page elements; subsequent calls reuse the cache. For image-based attachments without extractable text, consider supplying OCR output via SetText(string) or SetText(PageElement). If you do not need a specific layout mode, use GetText(CancellationToken).
- See Also
GetText(TextOutputMode, string, CancellationToken)
Extracts and returns the textual content from the specified pages, formatted with the given
mode.
public string GetText(TextOutputMode mode, string pageRange, CancellationToken cancellationToken = default)
Parameters
modeTextOutputModeControls how raw lines are grouped and spaced in the output. See TextOutputMode: RawLines (one line per detection), GridAligned (approximate columns/indentation), ParagraphFlow (paragraph grouping), Structured (paragraph and tabular preservation).
pageRangestringA page range specification using 1-based page numbers (e.g.,
"1-5, 7, 9-12"). Usenull, empty string, or"*"to include all pages. Invalid page numbers are silently ignored.cancellationTokenCancellationTokenA token to observe while performing extraction. If cancellation is requested before extraction completes, an OperationCanceledException is thrown.
Returns
- string
The extracted plain-text content from the specified pages, formatted according to
mode; an empty string if no textual content is available or if the page range resolves to no valid pages.
Remarks
On first call, this method may parse the underlying data and cache page elements; subsequent
calls reuse the cache. The pageRange filter is applied after extraction.
Page numbers in the range are 1-based (first page is 1). Ranges can be specified as:
"3"- single page"1-5"- page range (inclusive)"1-3, 7, 10-12"- multiple ranges and individual pages"5-1"- reversed ranges are normalized automatically
- See Also