Method AddSample
- Namespace
- LMKit.TextAnalysis.Training
- Assembly
- LM-Kit.NET.dll
AddSample(string, IEnumerable<EntityAnnotation>)
Adds a training sample from raw text content using the engine's preferred modality.
public void AddSample(string content, IEnumerable<EntityAnnotation> annotations)
Parameters
contentstringThe textual content to analyze for PII/entities.
annotationsIEnumerable<EntityAnnotation>Ground-truth entity annotations (label + representative text) expected in
content.
Examples
using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);
var dataset = new PiiExtractionTrainingDataset(pii);
// Add samples with various PII types
dataset.AddSample(
"SSN: 123-45-6789",
new[] { new EntityAnnotation("US_SSN", "123-45-6789") });
dataset.AddSample(
"My credit card is 4111-1111-1111-1111, expires 12/25.",
new[]
{
new EntityAnnotation("CreditCard", "4111-1111-1111-1111"),
new EntityAnnotation("ExpirationDate", "12/25")
});
dataset.AddSample(
"No sensitive information in this text.",
Array.Empty<EntityAnnotation>()); // Negative sample
dataset.ExportAsSharegpt("pii_dataset.json", overwrite: true);
Remarks
The content is wrapped as a text Attachment and
forwarded to AddSample(InferenceModality, Attachment, IEnumerable<EntityAnnotation>).
AddSample(Attachment, IEnumerable<EntityAnnotation>)
Adds a training sample from an Attachment using the engine's preferred modality.
public void AddSample(Attachment content, IEnumerable<EntityAnnotation> annotations)
Parameters
contentAttachmentThe input attachment (e.g., text, image, or multimodal source) to analyze.
annotationsIEnumerable<EntityAnnotation>Ground-truth entity annotations (label + representative text) expected in
content.
Examples
using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);
var dataset = new PiiExtractionTrainingDataset(pii);
// Create attachment from text
var textAttachment = Attachment.CreateFromText(
"Invoice to: Jane Smith, 123 Main St, New York, NY 10001",
"invoice");
dataset.AddSample(
textAttachment,
new[]
{
new EntityAnnotation("Person", "Jane Smith"),
new EntityAnnotation("Address", "123 Main St, New York, NY 10001")
});
// Create attachment from image file
var imageAttachment = Attachment.CreateFromFile("scanned_document.png");
dataset.AddSample(
imageAttachment,
new[]
{
new EntityAnnotation("Person", "John Doe"),
new EntityAnnotation("DateOfBirth", "1985-03-15")
});
dataset.ExportAsSharegpt("multimodal_dataset.json", overwrite: true);
Remarks
For text inputs, prefer CreateFromText(string, string). This overload delegates to AddSample(InferenceModality, Attachment, IEnumerable<EntityAnnotation>).
AddSample(InferenceModality, Attachment, IEnumerable<EntityAnnotation>)
Adds a training sample with an explicit InferenceModality.
public void AddSample(InferenceModality modality, Attachment content, IEnumerable<EntityAnnotation> annotations)
Parameters
modalityInferenceModalityThe inference modality to use for generating prompts and responses.
contentAttachmentThe content attachment to analyze.
annotationsIEnumerable<EntityAnnotation>Ground-truth entity annotations (label + representative text) expected in
content.
Examples
using var model = new LM("path/to/model.gguf");
using var pii = new PiiExtraction(model);
var dataset = new PiiExtractionTrainingDataset(pii);
var attachment = Attachment.CreateFromText(
"Contact: alice@example.com, bob@example.org",
"text");
// Explicitly specify Text modality
dataset.AddSample(
InferenceModality.Text,
attachment,
new[]
{
new EntityAnnotation("EmailAddress", "alice@example.com"),
new EntityAnnotation("EmailAddress", "bob@example.org")
});
// Add a Vision-only sample from an image
var imageAttachment = Attachment.CreateFromFile("id_card_scan.png");
dataset.AddSample(
InferenceModality.Vision,
imageAttachment,
new[]
{
new EntityAnnotation("Person", "Marie Dupont"),
new EntityAnnotation("IDNumber", "FR-123456789")
});
dataset.ExportAsSharegpt("modality_specific_dataset.json", overwrite: true);
Remarks
This method assembles a ShareGPT-style conversation from the configured prompts and
appends a ChatTrainingSample whose assistant response reflects the
provided annotations. When EnableModalityAugmentation is
true and modality is
Multimodal, additional samples are appended for
Text and Vision.