Constructor PiiExtractionTrainingDataset
- Namespace
- LMKit.TextAnalysis.Training
- Assembly
- LM-Kit.NET.dll
PiiExtractionTrainingDataset(PiiExtraction)
Initializes a PII/Entity-extraction-focused training dataset bound to a specific PiiExtraction configuration.
public PiiExtractionTrainingDataset(PiiExtraction engine)
Parameters
enginePiiExtractionThe configured PII extraction engine whose prompts, model, supported entity types, and preferred modality are used to generate training samples.
Examples
using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);
// Optionally configure custom entity definitions
pii.PiiEntityDefinitions.Clear();
pii.PiiEntityDefinitions.Add(new PiiEntityDefinition("Person", "Full name of a person"));
pii.PiiEntityDefinitions.Add(new PiiEntityDefinition("PhoneNumber", "Phone number in any format"));
pii.PiiEntityDefinitions.Add(new PiiEntityDefinition("EmailAddress", "Email address"));
var dataset = new PiiExtractionTrainingDataset(pii)
{
EnableModalityAugmentation = true
};
dataset.AddSample(
"Contact: Alice Martin, phone +33 6 12 34 56 78.",
new[]
{
new EntityAnnotation("Person", "Alice Martin"),
new EntityAnnotation("PhoneNumber", "+33 6 12 34 56 78")
});
dataset.ExportAsSharegpt("pii_dataset.json", overwrite: true);
Remarks
The constructor captures the current state of engine
(e.g., titles/descriptions, prompt templates, and modality preferences). Subsequent calls
to AddSample(Attachment, IEnumerable<EntityAnnotation>)
and overloads synthesize chat histories consistent with this configuration.