Table of Contents

Class ExtractionTrainingDataset

Namespace
LMKit.Extraction.Training
Assembly
LM-Kit.NET.dll

Training dataset builder specialized for the Text Extraction engine. Converts extraction configurations and labeled examples into ChatTrainingSample items usable for supervised fine-tuning.

public sealed class ExtractionTrainingDataset : TrainingDataset
Inheritance
ExtractionTrainingDataset
Inherited Members

Examples

// Build a fine-tuning dataset from labeled extraction samples
var extractor = new TextExtraction(model);
extractor.Elements = new List<TextExtractionElement>
{
    new TextExtractionElement("InvoiceId", ElementType.String),
    new TextExtractionElement("Total", ElementType.Double)
};

var dataset = new ExtractionTrainingDataset(extractor)
{
    EnableModalityAugmentation = true
};

dataset.AddSample(
    "Invoice #A-100, Total: $250.00",
    "{\"InvoiceId\":\"A-100\",\"Total\":250.00}");

Remarks

This dataset uses the current TextExtraction configuration (elements, prompts, model, and preferred modality) to synthesize ShareGPT-style chat conversations where the assistant response is the JSON completion for the labeled ground truth.

Constructors

ExtractionTrainingDataset(TextExtraction)

Initializes an extraction-focused training dataset bound to a specific TextExtraction configuration.

Properties

EnableModalityAugmentation

Gets or sets whether to add modality-augmented samples when the engine runs in Multimodal.

Methods

AddSample(Attachment, string)

Adds a training sample from an Attachment using the engine’s preferred modality.

AddSample(InferenceModality, Attachment, string)

Adds a training sample with an explicit InferenceModality.

AddSample(string, string)

Adds a training sample from raw text content using the engine’s preferred modality.

See Also

Share