Constructor NamedEntityRecognitionTrainingDataset
- Namespace
- LMKit.TextAnalysis.Training
- Assembly
- LM-Kit.NET.dll
NamedEntityRecognitionTrainingDataset(NamedEntityRecognition)
Initializes a NER-focused training dataset bound to a specific NamedEntityRecognition configuration.
public NamedEntityRecognitionTrainingDataset(NamedEntityRecognition engine)
Parameters
engineNamedEntityRecognitionThe configured NER engine whose prompts, model, supported entity definitions, and preferred modality are used to generate training samples.
Examples
using var model = new LM("path/to/model.gguf");
// Use default entity definitions (Person, Organization, Location, Date, etc.)
var ner = new NamedEntityRecognition(model);
var dataset = new NamedEntityRecognitionTrainingDataset(ner)
{
EnableModalityAugmentation = true
};
dataset.AddSample(
"Microsoft was founded by Bill Gates and Paul Allen in Albuquerque on April 4, 1975.",
new[]
{
new EntityAnnotation("Organization", "Microsoft"),
new EntityAnnotation("Person", "Bill Gates"),
new EntityAnnotation("Person", "Paul Allen"),
new EntityAnnotation("Location", "Albuquerque"),
new EntityAnnotation("Date", "April 4, 1975")
});
dataset.ExportAsSharegpt("ner_dataset.json", overwrite: true);
Remarks
The constructor captures the current state of engine
(e.g., entity definitions, prompt templates, and modality preferences). Subsequent calls
to AddSample(Attachment, IEnumerable<EntityAnnotation>)
and overloads synthesize chat histories consistent with this configuration.