Table of Contents

Method AddSample

Namespace
LMKit.TextAnalysis.Training
Assembly
LM-Kit.NET.dll

AddSample(string, IEnumerable<EntityAnnotation>)

Adds a training sample from raw text content using the engine's preferred modality.

public void AddSample(string content, IEnumerable<EntityAnnotation> annotations)

Parameters

content string

The textual content to analyze for PII/entities.

annotations IEnumerable<EntityAnnotation>

Ground-truth entity annotations (label + representative text) expected in content.

Examples

using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);
var dataset = new PiiExtractionTrainingDataset(pii);

// Add samples with various PII types
dataset.AddSample(
    "SSN: 123-45-6789",
    new[] { new EntityAnnotation("US_SSN", "123-45-6789") });

dataset.AddSample(
    "My credit card is 4111-1111-1111-1111, expires 12/25.",
    new[]
    {
        new EntityAnnotation("CreditCard", "4111-1111-1111-1111"),
        new EntityAnnotation("ExpirationDate", "12/25")
    });

dataset.AddSample(
    "No sensitive information in this text.",
    Array.Empty<EntityAnnotation>()); // Negative sample

dataset.ExportAsSharegpt("pii_dataset.json", overwrite: true);

Remarks

AddSample(Attachment, IEnumerable<EntityAnnotation>)

Adds a training sample from an Attachment using the engine's preferred modality.

public void AddSample(Attachment content, IEnumerable<EntityAnnotation> annotations)

Parameters

content Attachment

The input attachment (e.g., text, image, or multimodal source) to analyze.

annotations IEnumerable<EntityAnnotation>

Ground-truth entity annotations (label + representative text) expected in content.

Examples

using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);
var dataset = new PiiExtractionTrainingDataset(pii);

// Create attachment from text
var textAttachment = Attachment.CreateFromText(
    "Invoice to: Jane Smith, 123 Main St, New York, NY 10001",
    "invoice");

dataset.AddSample(
    textAttachment,
    new[]
    {
        new EntityAnnotation("Person", "Jane Smith"),
        new EntityAnnotation("Address", "123 Main St, New York, NY 10001")
    });

// Create attachment from image file
var imageAttachment = Attachment.CreateFromFile("scanned_document.png");

dataset.AddSample(
    imageAttachment,
    new[]
    {
        new EntityAnnotation("Person", "John Doe"),
        new EntityAnnotation("DateOfBirth", "1985-03-15")
    });

dataset.ExportAsSharegpt("multimodal_dataset.json", overwrite: true);

Remarks

AddSample(InferenceModality, Attachment, IEnumerable<EntityAnnotation>)

Adds a training sample with an explicit InferenceModality.

public void AddSample(InferenceModality modality, Attachment content, IEnumerable<EntityAnnotation> annotations)

Parameters

modality InferenceModality

The inference modality to use for generating prompts and responses.

content Attachment

The content attachment to analyze.

annotations IEnumerable<EntityAnnotation>

Ground-truth entity annotations (label + representative text) expected in content.

Examples

using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);
var dataset = new PiiExtractionTrainingDataset(pii);

var attachment = Attachment.CreateFromText(
    "Contact: alice@example.com, bob@example.org",
    "text");

// Explicitly specify Text modality
dataset.AddSample(
    InferenceModality.Text,
    attachment,
    new[]
    {
        new EntityAnnotation("EmailAddress", "alice@example.com"),
        new EntityAnnotation("EmailAddress", "bob@example.org")
    });

// Add a Vision-only sample from an image
var imageAttachment = Attachment.CreateFromFile("id_card_scan.png");

dataset.AddSample(
    InferenceModality.Vision,
    imageAttachment,
    new[]
    {
        new EntityAnnotation("Person", "Marie Dupont"),
        new EntityAnnotation("IDNumber", "FR-123456789")
    });

dataset.ExportAsSharegpt("modality_specific_dataset.json", overwrite: true);

Remarks

This method assembles a ShareGPT-style conversation from the configured prompts and appends a ChatTrainingSample whose assistant response reflects the provided annotations. When EnableModalityAugmentation is true and modality is Multimodal, additional samples are appended for Text and Vision.