Table of Contents

Method GetTextAsync

Namespace
LMKit.Data
Assembly
LM-Kit.NET.dll

GetTextAsync(CancellationToken)

Asynchronously extracts and returns the textual content from the attachment.

public Task<string> GetTextAsync(CancellationToken cancellationToken = default)

Parameters

cancellationToken CancellationToken

A token to monitor for cancellation requests. Default: None.

Returns

Task<string>

A task whose result is the textual content; empty string if no text is available.

GetTextAsync(string, CancellationToken)

Asynchronously extracts and returns the textual content from the specified pages of the attachment.

public Task<string> GetTextAsync(string pageRange, CancellationToken cancellationToken = default)

Parameters

pageRange string

A page range specification using 1-based page numbers (e.g., "1-5, 7, 9-12"). Use null, empty string, or "*" to include all pages. Invalid page numbers are silently ignored.

cancellationToken CancellationToken

A token to monitor for cancellation requests. Default: None.

Returns

Task<string>

A task whose result is the extracted plain-text content from the specified pages; an empty string if no text is available or if the page range resolves to no valid pages.

Remarks

Page numbers in the range are 1-based (first page is 1). Ranges can be specified as:

  • "3" - single page
  • "1-5" - page range (inclusive)
  • "1-3, 7, 10-12" - multiple ranges and individual pages
  • "5-1" - reversed ranges are normalized automatically

GetTextAsync(TextOutputMode, CancellationToken)

Asynchronously extracts and returns the textual content formatted with the given mode.

public Task<string> GetTextAsync(TextOutputMode mode, CancellationToken cancellationToken = default)

Parameters

mode TextOutputMode

Controls how raw lines are grouped and spaced in the output. See TextOutputMode: RawLines, GridAligned, ParagraphFlow, or Structured.

cancellationToken CancellationToken

A token to observe while performing extraction. If cancellation is requested, the operation throws OperationCanceledException.

Returns

Task<string>

A task that completes with the extracted plain-text content (UTF-8, Unix line endings) formatted according to mode; the result is an empty string when the attachment has no extractable text.

Remarks

The first invocation performs extraction and caches page elements; later calls reuse the cache. The layout mode is applied at formatting time without re-extracting text. For image-only inputs, provide OCR text via SetText(string) or SetText(PageElement) to obtain non-empty output. If you want the default layout, use GetTextAsync(CancellationToken).

See Also

GetTextAsync(TextOutputMode, string, CancellationToken)

Asynchronously extracts and returns the textual content from the specified pages, formatted with the given mode.

public Task<string> GetTextAsync(TextOutputMode mode, string pageRange, CancellationToken cancellationToken = default)

Parameters

mode TextOutputMode

Controls how raw lines are grouped and spaced in the output. See TextOutputMode: RawLines, GridAligned, ParagraphFlow, or Structured.

pageRange string

A page range specification using 1-based page numbers (e.g., "1-5, 7, 9-12"). Use null, empty string, or "*" to include all pages. Invalid page numbers are silently ignored.

cancellationToken CancellationToken

A token to observe while performing extraction. If cancellation is requested, the operation throws OperationCanceledException.

Returns

Task<string>

A task that completes with the extracted plain-text content from the specified pages, formatted according to mode; the result is an empty string when the attachment has no extractable text or if the page range resolves to no valid pages.

Remarks

The first invocation performs extraction and caches page elements; later calls reuse the cache. The pageRange filter is applied after extraction. Page numbers in the range are 1-based (first page is 1). Ranges can be specified as:

  • "3" - single page
  • "1-5" - page range (inclusive)
  • "1-3, 7, 10-12" - multiple ranges and individual pages
  • "5-1" - reversed ranges are normalized automatically
See Also