Table of Contents

Class TextExtraction

Namespace
LMKit.Extraction
Assembly
LM-Kit.NET.dll

Provides functionality to extract structured data from unstructured text and images using a language model.

public sealed class TextExtraction
Inheritance
TextExtraction
Inherited Members

Remarks

The TextExtraction class allows you to define a set of elements to extract from given text content and image attachments. It utilizes a language model to parse the content and extract the specified elements.

Constructors

TextExtraction(LM)

Initializes a new instance of the TextExtraction class with the specified language model.

Fields

Description

Gets or sets the description for the current extraction schema. This value is automatically populated from the schema's "description" field when calling SetElementsFromJsonSchema.

Title

Gets or sets the title for the current extraction schema. This value is automatically populated from the schema's "title" field when calling SetElementsFromJsonSchema.

Properties

Elements

Gets or sets the list of TextExtractionElement instances that define the elements to extract from the content.

Guidance

Gets or sets semantic guidance for the extraction process.

MaximumContextLength

Gets or sets the maximum context length (in tokens) allowed for the language model during text extraction.

NullOnDoubt

When true, the language model will return null on uncertain content detection rather than risk an aggressive extraction leading to 'false positives'.

OcrEngine

Gets or sets the OcrEngine used to perform OCR on image attachments.

PreferredInferenceModality

Gets or sets the preferred modality for inference. This determines whether text, image, or both modalities are used when processing input. Defaults to Multimodal.

Methods

ClearContent()

Removes all previously set input (both text and attachments) so that no content remains for extraction.

Parse(CancellationToken)

Parses the content synchronously to extract the defined elements.

ParseAsync(CancellationToken)

Parses the content asynchronously to extract the defined elements.

SetContent(Attachment)

Sets an image attachment to be processed for data extraction.

SetContent(IEnumerable<Attachment>)

Sets multiple image attachments to be processed for data extraction.

SetContent(string)

Sets the text content from which the elements will be extracted.

SetElementsFromJsonSchema(string)

Configures the text extraction elements by parsing a JSON schema.