Table of Contents

Method GetText

Namespace
LMKit.Data
Assembly
LM-Kit.NET.dll

GetText(CancellationToken)

Extracts and returns the textual content from the attachment.

public string GetText(CancellationToken cancellationToken = default)

Parameters

cancellationToken CancellationToken

A token to monitor for cancellation requests. Default: None.

Returns

string

The textual content; empty string if no text is available.

GetText(string, CancellationToken)

Extracts and returns the textual content from the specified pages of the attachment.

public string GetText(string pageRange, CancellationToken cancellationToken = default)

Parameters

pageRange string

A page range specification using 1-based page numbers (e.g., "1-5, 7, 9-12"). Use null, empty string, or "*" to include all pages. Invalid page numbers are silently ignored.

cancellationToken CancellationToken

A token to monitor for cancellation requests. Default: None.

Returns

string

The extracted plain-text content from the specified pages; an empty string if no text is available or if the page range resolves to no valid pages.

Remarks

Page numbers in the range are 1-based (first page is 1). Ranges can be specified as:

  • "3" - single page
  • "1-5" - page range (inclusive)
  • "1-3, 7, 10-12" - multiple ranges and individual pages
  • "5-1" - reversed ranges are normalized automatically

GetText(TextOutputMode, CancellationToken)

Extracts and returns the textual content using the specified layout aggregation mode.

public string GetText(TextOutputMode mode, CancellationToken cancellationToken = default)

Parameters

mode TextOutputMode

Controls how raw lines are grouped and spaced in the output. See TextOutputMode: RawLines (one line per detection), GridAligned (approximate columns/indentation), ParagraphFlow (paragraph grouping), Structured (paragraph and tabular preservation).

cancellationToken CancellationToken

A token to observe while performing extraction. If cancellation is requested before extraction completes, an OperationCanceledException is thrown.

Returns

string

The extracted plain-text content formatted according to mode; an empty string if no textual content is available (e.g., images without OCR or unsupported formats).

Remarks

On first call, this method may parse the underlying data and cache page elements; subsequent calls reuse the cache. For image-based attachments without extractable text, consider supplying OCR output via SetText(string) or SetText(PageElement). If you do not need a specific layout mode, use GetText(CancellationToken).

See Also

GetText(TextOutputMode, string, CancellationToken)

Extracts and returns the textual content from the specified pages, formatted with the given mode.

public string GetText(TextOutputMode mode, string pageRange, CancellationToken cancellationToken = default)

Parameters

mode TextOutputMode

Controls how raw lines are grouped and spaced in the output. See TextOutputMode: RawLines (one line per detection), GridAligned (approximate columns/indentation), ParagraphFlow (paragraph grouping), Structured (paragraph and tabular preservation).

pageRange string

A page range specification using 1-based page numbers (e.g., "1-5, 7, 9-12"). Use null, empty string, or "*" to include all pages. Invalid page numbers are silently ignored.

cancellationToken CancellationToken

A token to observe while performing extraction. If cancellation is requested before extraction completes, an OperationCanceledException is thrown.

Returns

string

The extracted plain-text content from the specified pages, formatted according to mode; an empty string if no textual content is available or if the page range resolves to no valid pages.

Remarks

On first call, this method may parse the underlying data and cache page elements; subsequent calls reuse the cache. The pageRange filter is applied after extraction. Page numbers in the range are 1-based (first page is 1). Ranges can be specified as:

  • "3" - single page
  • "1-5" - page range (inclusive)
  • "1-3, 7, 10-12" - multiple ranges and individual pages
  • "5-1" - reversed ranges are normalized automatically
See Also