Class PiiExtractionTrainingDataset

Namespace: LMKit.TextAnalysis.Training

Assembly: LM-Kit.NET.dll

Training dataset builder specialized for the PII/Entity Extraction engine. Converts labeled entity annotations into ChatTrainingSample items usable for supervised fine-tuning.

public sealed class PiiExtractionTrainingDataset : TrainingDataset

Inheritance: object

TrainingDataset

PiiExtractionTrainingDataset

Inherited Members: TrainingDataset.Samples

TrainingDataset.AddSample(ChatTrainingSample)

TrainingDataset.ExportAsSharegpt(string, bool, string, CancellationToken)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Examples

// Complete example: Build a PII extraction training dataset
using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);

var dataset = new PiiExtractionTrainingDataset(pii)
{
    EnableModalityAugmentation = true
};

// Add multiple labeled samples
dataset.AddSample(
    "Contact: Alice Martin, phone +33 6 12 34 56 78.",
    new[]
    {
        new EntityAnnotation("Person", "Alice Martin"),
        new EntityAnnotation("PhoneNumber", "+33 6 12 34 56 78")
    });

dataset.AddSample(
    "Email john.doe@acme.com for invoice #INV-2024-001.",
    new[]
    {
        new EntityAnnotation("EmailAddress", "john.doe@acme.com")
    });

// Export to ShareGPT format for fine-tuning
dataset.ExportAsSharegpt("pii_training_dataset.json", overwrite: true);

Remarks

This dataset uses the current PiiExtraction configuration (entity types, prompts, model, and preferred modality) to synthesize ShareGPT-style chat conversations where the assistant response reflects the ground-truth labels provided via EntityAnnotation instances.

Constructors

PiiExtractionTrainingDataset(PiiExtraction): Initializes a PII/Entity-extraction-focused training dataset bound to a specific PiiExtraction configuration.

Properties

EnableModalityAugmentation: Gets or sets whether to add modality-augmented samples when the engine runs in Multimodal.

Methods

AddSample(Attachment, IEnumerable<EntityAnnotation>): Adds a training sample from an Attachment using the engine's preferred modality.

AddSample(InferenceModality, Attachment, IEnumerable<EntityAnnotation>): Adds a training sample with an explicit InferenceModality.

AddSample(string, IEnumerable<EntityAnnotation>): Adds a training sample from raw text content using the engine's preferred modality.

Table of Contents