Class TextExtraction
- Namespace
- LMKit.Extraction
- Assembly
- LM-Kit.NET.dll
Provides functionality to extract structured data from unstructured text and images using a language model.
public sealed class TextExtraction
- Inheritance
-
TextExtraction
- Inherited Members
Remarks
The TextExtraction class allows you to define a set of elements to extract from given text content and image attachments. It utilizes a language model to parse the content and extract the specified elements.
Constructors
- TextExtraction(LM)
Initializes a new instance of the TextExtraction class with the specified language model.
Fields
- Description
Gets or sets the description for the current extraction schema. This value is automatically populated from the schema's "description" field when calling SetElementsFromJsonSchema.
- Title
Gets or sets the title for the current extraction schema. This value is automatically populated from the schema's "title" field when calling SetElementsFromJsonSchema.
Properties
- Elements
Gets or sets the list of TextExtractionElement instances that define the elements to extract from the content.
- Guidance
Gets or sets semantic guidance for the extraction process.
- MaximumContextLength
Gets or sets the maximum context length (in tokens) allowed for the language model during text extraction.
- NullOnDoubt
When true, the language model will return null on uncertain content detection rather than risk an aggressive extraction leading to 'false positives'.
- PreferredInferenceModality
Gets or sets the preferred modality for inference. This determines whether text, image, or both modalities are used when processing input. Defaults to Multimodal.
Methods
- ClearContent()
Removes all previously set input (both text and attachments) so that no content remains for extraction.
- Parse(CancellationToken)
Parses the content synchronously to extract the defined elements.
- ParseAsync(CancellationToken)
Parses the content asynchronously to extract the defined elements.
- SetContent(Attachment)
Sets an image attachment to be processed for data extraction.
- SetContent(IEnumerable<Attachment>)
Sets multiple image attachments to be processed for data extraction.
- SetContent(string)
Sets the text content from which the elements will be extracted.
- SetElementsFromJsonSchema(string)
Configures the text extraction elements by parsing a JSON schema.