Class TextExtraction

Namespace: LMKit.Extraction

Assembly: LM-Kit.NET.dll

Provides functionality to extract structured data from unstructured text and images using a language model.

public sealed class TextExtraction

Inheritance: object

TextExtraction

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The TextExtraction class allows you to define a set of elements to extract from given text content and image attachments. It utilizes a language model to parse the content and extract the specified elements.

Constructors

TextExtraction(LM): Initializes a new instance of the TextExtraction class with the specified language model.

Fields

Description: Gets or sets the description for the current extraction schema. This value is automatically populated from the schema's "description" field when calling SetElementsFromJsonSchema.

Title: Gets or sets the title for the current extraction schema. This value is automatically populated from the schema's "title" field when calling SetElementsFromJsonSchema.

Properties

Elements: Gets or sets the list of TextExtractionElement instances that define the elements to extract from the content.

Guidance: Gets or sets semantic guidance for the extraction process.

JsonSchema: Gets the JSON schema representation of the extraction elements.

MaximumContextLength: Gets or sets the maximum context length (in tokens) allowed for the language model during text extraction.

Model: Gets the language model instance used to drive the extraction process.

NullOnDoubt: When true, the language model will return null on uncertain content detection rather than risk an aggressive extraction leading to 'false positives'.

OcrEngine: Gets or sets an optional OcrEngine used to perform traditional OCR on raster content.

PreferredInferenceModality: Gets or sets the preferred modality for inference. This determines whether text, image, or both modalities are used when processing input. Defaults to Multimodal.

Methods

ClearContent(): Removes all previously set input (both text and attachments) so that no content remains for extraction.

Parse(CancellationToken): Parses the content synchronously to extract the defined elements.

ParseAsync(CancellationToken): Parses the content asynchronously to extract the defined elements.

SetContent(Attachment): Sets an image attachment to be processed for data extraction.

SetContent(ImageBuffer): Sets the content for extraction from the specified image buffer.

SetContent(IEnumerable<Attachment>): Sets multiple image attachments to be processed for data extraction.

SetContent(string): Sets the text content from which the elements will be extracted.