Table of Contents

Method AddSample

Namespace
LMKit.TextAnalysis.Training
Assembly
LM-Kit.NET.dll

AddSample(string, IEnumerable<EntityAnnotation>)

Adds a training sample from raw text content using the engine's preferred modality.

public void AddSample(string content, IEnumerable<EntityAnnotation> annotations)

Parameters

content string

The textual content to analyze for named entities.

annotations IEnumerable<EntityAnnotation>

Ground-truth entity annotations (label + representative text) expected in content.

Examples

using var model = new LM("path/to/model.gguf");
var ner = new NamedEntityRecognition(model);
var dataset = new NamedEntityRecognitionTrainingDataset(ner);

// Add samples with various entity types
dataset.AddSample(
    "Amazon reported $134 billion in revenue for Q3 2023.",
    new[]
    {
        new EntityAnnotation("Organization", "Amazon"),
        new EntityAnnotation("Money", "$134 billion"),
        new EntityAnnotation("Date", "Q3 2023")
    });

dataset.AddSample(
    "The Treaty of Versailles was signed on June 28, 1919.",
    new[]
    {
        new EntityAnnotation("Event", "Treaty of Versailles"),
        new EntityAnnotation("Date", "June 28, 1919")
    });

// Negative sample: no entities present
dataset.AddSample(
    "The weather is nice today.",
    Array.Empty<EntityAnnotation>());

dataset.ExportAsSharegpt("ner_dataset.json", overwrite: true);

Remarks

AddSample(Attachment, IEnumerable<EntityAnnotation>)

Adds a training sample from an Attachment using the engine's preferred modality.

public void AddSample(Attachment content, IEnumerable<EntityAnnotation> annotations)

Parameters

content Attachment

The input attachment (e.g., text, image, or multimodal source) to analyze.

annotations IEnumerable<EntityAnnotation>

Ground-truth entity annotations (label + representative text) expected in content.

Examples

using var model = new LM("path/to/model.gguf");
var ner = new NamedEntityRecognition(model);
var dataset = new NamedEntityRecognitionTrainingDataset(ner);

// Create attachment from text
var textAttachment = Attachment.CreateFromText(
    "President Biden met with Chancellor Scholz in Berlin on February 15, 2024.",
    "news");

dataset.AddSample(
    textAttachment,
    new[]
    {
        new EntityAnnotation("Person", "President Biden"),
        new EntityAnnotation("Person", "Chancellor Scholz"),
        new EntityAnnotation("Location", "Berlin"),
        new EntityAnnotation("Date", "February 15, 2024")
    });

// Create attachment from image file (e.g., scanned news article)
var imageAttachment = Attachment.CreateFromFile("news_clipping.png");

dataset.AddSample(
    imageAttachment,
    new[]
    {
        new EntityAnnotation("Organization", "United Nations"),
        new EntityAnnotation("Location", "Geneva"),
        new EntityAnnotation("Date", "March 2024")
    });

dataset.ExportAsSharegpt("multimodal_ner_dataset.json", overwrite: true);

Remarks

AddSample(InferenceModality, Attachment, IEnumerable<EntityAnnotation>)

Adds a training sample with an explicit InferenceModality.

public void AddSample(InferenceModality modality, Attachment content, IEnumerable<EntityAnnotation> annotations)

Parameters

modality InferenceModality

The inference modality to use for generating prompts and responses.

content Attachment

The content attachment to analyze.

annotations IEnumerable<EntityAnnotation>

Ground-truth entity annotations (label + representative text) expected in content.

Examples

using var model = new LM("path/to/model.gguf");
var ner = new NamedEntityRecognition(model);
var dataset = new NamedEntityRecognitionTrainingDataset(ner);

var attachment = Attachment.CreateFromText(
    "SpaceX launched Falcon 9 from Cape Canaveral carrying Starlink satellites.",
    "text");

// Explicitly specify Text modality
dataset.AddSample(
    InferenceModality.Text,
    attachment,
    new[]
    {
        new EntityAnnotation("Organization", "SpaceX"),
        new EntityAnnotation("Product", "Falcon 9"),
        new EntityAnnotation("Location", "Cape Canaveral"),
        new EntityAnnotation("Product", "Starlink")
    });

// Add a Vision-only sample from an image
var imageAttachment = Attachment.CreateFromFile("business_card.png");

dataset.AddSample(
    InferenceModality.Vision,
    imageAttachment,
    new[]
    {
        new EntityAnnotation("Person", "John Smith"),
        new EntityAnnotation("Organization", "Acme Corporation"),
        new EntityAnnotation("Phone", "+1 555-123-4567"),
        new EntityAnnotation("Email", "john.smith@acme.com")
    });

dataset.ExportAsSharegpt("modality_specific_ner_dataset.json", overwrite: true);

Remarks

This method assembles a ShareGPT-style conversation from the configured prompts and appends a ChatTrainingSample whose assistant response reflects the provided annotations. When EnableModalityAugmentation is true and modality is Multimodal, additional samples are appended for Text and Vision.