Table of Contents

Class NamedEntityRecognitionTrainingDataset

Namespace
LMKit.TextAnalysis.Training
Assembly
LM-Kit.NET.dll

Training dataset builder specialized for the Named Entity Recognition (NER) engine. Converts labeled entity annotations into ChatTrainingSample items usable for supervised fine-tuning.

public sealed class NamedEntityRecognitionTrainingDataset : TrainingDataset
Inheritance
NamedEntityRecognitionTrainingDataset
Inherited Members

Examples

// Complete example: Build a NER training dataset
using var model = new LM("path/to/model.gguf");
var ner = new NamedEntityRecognition(model);

var dataset = new NamedEntityRecognitionTrainingDataset(ner)
{
    EnableModalityAugmentation = true
};

// Add multiple labeled samples
dataset.AddSample(
    "Apple Inc. announced that CEO Tim Cook will visit Paris next Monday.",
    new[]
    {
        new EntityAnnotation("Organization", "Apple Inc."),
        new EntityAnnotation("Person", "Tim Cook"),
        new EntityAnnotation("Location", "Paris"),
        new EntityAnnotation("Date", "next Monday")
    });

dataset.AddSample(
    "The Eiffel Tower was completed in 1889 by Gustave Eiffel.",
    new[]
    {
        new EntityAnnotation("Location", "Eiffel Tower"),
        new EntityAnnotation("Date", "1889"),
        new EntityAnnotation("Person", "Gustave Eiffel")
    });

// Export to ShareGPT format for fine-tuning
dataset.ExportAsSharegpt("ner_training_dataset.json", overwrite: true);

Remarks

This dataset uses the current NamedEntityRecognition configuration (entity definitions, prompts, model, and preferred modality) to synthesize ShareGPT-style chat conversations where the assistant response reflects the ground-truth labels provided via EntityAnnotation instances.

Constructors

NamedEntityRecognitionTrainingDataset(NamedEntityRecognition)

Initializes a NER-focused training dataset bound to a specific NamedEntityRecognition configuration.

Properties

EnableModalityAugmentation

Gets or sets whether to add modality-augmented samples when the engine runs in Multimodal.

Methods

AddSample(Attachment, IEnumerable<EntityAnnotation>)

Adds a training sample from an Attachment using the engine's preferred modality.

AddSample(InferenceModality, Attachment, IEnumerable<EntityAnnotation>)

Adds a training sample with an explicit InferenceModality.

AddSample(string, IEnumerable<EntityAnnotation>)

Adds a training sample from raw text content using the engine's preferred modality.

See Also