Table of Contents

Constructor NamedEntityRecognitionTrainingDataset

Namespace
LMKit.TextAnalysis.Training
Assembly
LM-Kit.NET.dll

NamedEntityRecognitionTrainingDataset(NamedEntityRecognition)

Initializes a NER-focused training dataset bound to a specific NamedEntityRecognition configuration.

public NamedEntityRecognitionTrainingDataset(NamedEntityRecognition engine)

Parameters

engine NamedEntityRecognition

The configured NER engine whose prompts, model, supported entity definitions, and preferred modality are used to generate training samples.

Examples

using var model = new LM("path/to/model.gguf");

// Use default entity definitions (Person, Organization, Location, Date, etc.)
var ner = new NamedEntityRecognition(model);

var dataset = new NamedEntityRecognitionTrainingDataset(ner)
{
    EnableModalityAugmentation = true
};

dataset.AddSample(
    "Microsoft was founded by Bill Gates and Paul Allen in Albuquerque on April 4, 1975.",
    new[]
    {
        new EntityAnnotation("Organization", "Microsoft"),
        new EntityAnnotation("Person", "Bill Gates"),
        new EntityAnnotation("Person", "Paul Allen"),
        new EntityAnnotation("Location", "Albuquerque"),
        new EntityAnnotation("Date", "April 4, 1975")
    });

dataset.ExportAsSharegpt("ner_dataset.json", overwrite: true);

Remarks

The constructor captures the current state of engine (e.g., entity definitions, prompt templates, and modality preferences). Subsequent calls to AddSample(Attachment, IEnumerable<EntityAnnotation>) and overloads synthesize chat histories consistent with this configuration.