Table of Contents

Constructor PiiExtractionTrainingDataset

Namespace
LMKit.TextAnalysis.Training
Assembly
LM-Kit.NET.dll

PiiExtractionTrainingDataset(PiiExtraction)

Initializes a PII/Entity-extraction-focused training dataset bound to a specific PiiExtraction configuration.

public PiiExtractionTrainingDataset(PiiExtraction engine)

Parameters

engine PiiExtraction

The configured PII extraction engine whose prompts, model, supported entity types, and preferred modality are used to generate training samples.

Examples

using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);

// Optionally configure custom entity definitions
pii.PiiEntityDefinitions.Clear();
pii.PiiEntityDefinitions.Add(new PiiEntityDefinition("Person", "Full name of a person"));
pii.PiiEntityDefinitions.Add(new PiiEntityDefinition("PhoneNumber", "Phone number in any format"));
pii.PiiEntityDefinitions.Add(new PiiEntityDefinition("EmailAddress", "Email address"));

var dataset = new PiiExtractionTrainingDataset(pii)
{
    EnableModalityAugmentation = true
};

dataset.AddSample(
    "Contact: Alice Martin, phone +33 6 12 34 56 78.",
    new[]
    {
        new EntityAnnotation("Person", "Alice Martin"),
        new EntityAnnotation("PhoneNumber", "+33 6 12 34 56 78")
    });

dataset.ExportAsSharegpt("pii_dataset.json", overwrite: true);

Remarks

The constructor captures the current state of engine (e.g., titles/descriptions, prompt templates, and modality preferences). Subsequent calls to AddSample(Attachment, IEnumerable<EntityAnnotation>) and overloads synthesize chat histories consistent with this configuration.