Method GetTrainingData
- Namespace
- LMKit.Translation
- Assembly
- LM-Kit.NET.dll
GetTrainingData(TrainingDataset, int, bool, int?)
Retrieves training data for fine-tuning language detection models from the specified dataset.
public static List<(string, Language)> GetTrainingData(TextTranslation.TrainingDataset dataset, int maxSamples = 1000, bool shuffle = true, int? seed = null)
Parameters
dataset
TextTranslation.TrainingDatasetThe dataset identifier from the TextTranslation.TrainingDataset enumeration.
maxSamples
intThe maximum number of samples to retrieve. Default is 1000.
shuffle
boolIf set to
true
, the dataset is shuffled before samples are selected. Default istrue
.seed
int?An optional seed for the random number generator used during shuffling. If
null
, the shuffle operation is unseeded.
Returns
- List<(string, Language)>
A list of tuples where each tuple consists of:
- A string representing a text sample.
- A Language enumeration value representing the corresponding language label.
Examples
List<(string, Language)> trainingData = TextTranslation.GetTrainingData(
TextTranslation.TrainingDataset.LanguageDetection_LMKit2024_09_INT,
maxSamples: 500,
shuffle: true,
seed: 42);
foreach (var sample in trainingData)
{
Console.WriteLine($"Text: {sample.Item1}, Language: {sample.Item2}");
}
Exceptions
- ArgumentException
Thrown if the specified
dataset
is not recognized.