Method GetTrainingData

Namespace: LMKit.TextAnalysis

Assembly: LM-Kit.NET.dll

GetTrainingData(TrainingDataset, int, bool, int?, bool)

Retrieves training data for fine-tuning a sentiment analysis model from the specified dataset.

public static List<(string, SentimentAnalysis.SentimentCategory)> GetTrainingData(SentimentAnalysis.TrainingDataset dataset, int maxSamples = 1000, bool shuffle = true, int? seed = null, bool neutralSupport = true)

Parameters

dataset SentimentAnalysis.TrainingDataset: The dataset from which to retrieve the training data.
maxSamples int: The maximum number of samples to retrieve from the dataset. The default is 1000.
shuffle bool: Indicates whether to shuffle the dataset before selecting samples. The default is true.
seed int?: An optional seed for the random number generator used when shuffling. If null, the shuffle operation will not be seeded.
neutralSupport bool: Specifies whether support for neutral samples should be included. The default is true.

Returns

List<(string, SentimentAnalysis.SentimentCategory)>: A list of tuples, where each tuple contains a string (the text) and a SentimentAnalysis.SentimentCategory (the sentiment label).

Examples

// Retrieve training data
var trainingData = SentimentAnalysis.GetTrainingData(
    SentimentAnalysis.TrainingDataset.LMKit2024_09_INT,
    maxSamples: 500,
    shuffle: true,
    seed: 42,
    neutralSupport: true);

// Use the training data as needed
foreach (var sample in trainingData)
{
    Console.WriteLine($"Text: {sample.Item1}, Sentiment: {sample.Item2}");
}

Remarks

This method provides predefined datasets that can be used for training or fine-tuning the sentiment analysis model.

Exceptions

ArgumentException: Thrown if the dataset is not recognized.