Method GetTrainingData
- Namespace
- LMKit.Translation
- Assembly
- LM-Kit.NET.dll
GetTrainingData(TrainingDataset, int, bool, int?)
Retrieves training data for fine-tuning language detection models from the specified dataset.
public static List<(string, Language)> GetTrainingData(TextTranslation.TrainingDataset dataset, int maxSamples = 1000, bool shuffle = true, int? seed = null)
Parameters
datasetTextTranslation.TrainingDatasetThe dataset identifier from the TextTranslation.TrainingDataset enumeration.
maxSamplesintThe maximum number of samples to retrieve. Default is 1000.
shuffleboolIf set to
true, the dataset is shuffled before samples are selected. Default istrue.seedint?An optional seed for the random number generator used during shuffling. If
null, the shuffle operation is unseeded.
Returns
- List<(string, Language)>
A list of tuples where each tuple consists of:
- A string representing a text sample.
- A Language enumeration value representing the corresponding language label.
Examples
List<(string, Language)> trainingData = TextTranslation.GetTrainingData(
TextTranslation.TrainingDataset.LanguageDetection_LMKit2024_09_INT,
maxSamples: 500,
shuffle: true,
seed: 42);
foreach (var sample in trainingData)
{
Console.WriteLine($"Text: {sample.Item1}, Language: {sample.Item2}");
}
Exceptions
- ArgumentException
Thrown if the specified
datasetis not recognized.