Class ExtractionTrainingDataset
- Namespace
- LMKit.Extraction.Training
- Assembly
- LM-Kit.NET.dll
Training dataset builder specialized for the Text Extraction engine. Converts extraction configurations and labeled examples into ChatTrainingSample items usable for supervised fine-tuning.
public sealed class ExtractionTrainingDataset : TrainingDataset
- Inheritance
-
ExtractionTrainingDataset
- Inherited Members
Examples
// Build a fine-tuning dataset from labeled extraction samples
var extractor = new TextExtraction(model);
extractor.Elements = new List<TextExtractionElement>
{
new TextExtractionElement("InvoiceId", ElementType.String),
new TextExtractionElement("Total", ElementType.Double)
};
var dataset = new ExtractionTrainingDataset(extractor)
{
EnableModalityAugmentation = true
};
dataset.AddSample(
"Invoice #A-100, Total: $250.00",
"{\"InvoiceId\":\"A-100\",\"Total\":250.00}");
Remarks
This dataset uses the current TextExtraction configuration (elements, prompts, model, and preferred modality) to synthesize ShareGPT-style chat conversations where the assistant response is the JSON completion for the labeled ground truth.
Constructors
- ExtractionTrainingDataset(TextExtraction)
Initializes an extraction-focused training dataset bound to a specific TextExtraction configuration.
Properties
- EnableModalityAugmentation
Gets or sets whether to add modality-augmented samples when the engine runs in Multimodal.
Methods
- AddSample(Attachment, string)
Adds a training sample from an Attachment using the engine’s preferred modality.
- AddSample(InferenceModality, Attachment, string)
Adds a training sample with an explicit InferenceModality.
- AddSample(string, string)
Adds a training sample from raw text content using the engine’s preferred modality.