Table of Contents

Class ExtractionTrainingDataset

Namespace
LMKit.Extraction.Training
Assembly
LM-Kit.NET.dll

Training dataset builder specialized for the Text Extraction engine. Converts extraction configurations and labeled examples into ChatTrainingSample items usable for supervised fine-tuning.

public sealed class ExtractionTrainingDataset : TrainingDataset
Inheritance
ExtractionTrainingDataset
Inherited Members

Remarks

This dataset uses the current TextExtraction configuration (elements, prompts, model, and preferred modality) to synthesize ShareGPT-style chat conversations where the assistant response is the JSON completion for the labeled ground truth.

Constructors

ExtractionTrainingDataset(TextExtraction)

Initializes an extraction-focused training dataset bound to a specific TextExtraction configuration.

Properties

EnableModalityAugmentation

Gets or sets whether to add modality-augmented samples when the engine runs in Multimodal.

Methods

AddSample(Attachment, string)

Adds a training sample from an Attachment using the engine’s preferred modality.

AddSample(InferenceModality, Attachment, string)

Adds a training sample with an explicit InferenceModality.

AddSample(string, string)

Adds a training sample from raw text content using the engine’s preferred modality.

See Also