Method GetTrainingData
- Namespace
- LMKit.Translation
- Assembly
- LM-Kit.NET.dll
GetTrainingData(TrainingDataset, int, bool, int?)
Retrieves training data for fine-tuning a language detection model from the specified dataset.
public static List<(string, Language)> GetTrainingData(TextTranslation.TrainingDataset dataset, int maxSamples = 1000, bool shuffle = true, int? seed = null)
Parameters
dataset
TextTranslation.TrainingDatasetThe dataset from which to retrieve the training data.
maxSamples
intThe maximum number of samples to retrieve from the dataset. The default is
1000
.shuffle
boolIndicates whether to shuffle the dataset before selecting samples. The default is
true
.seed
int?An optional seed for the random number generator used when shuffling. If
null
, the shuffle operation will not be seeded.
Returns
- List<(string, Language)>
A list of tuples, where each tuple contains a string (the text) and a Language (the language label).
Examples
using LMKit.Translation;
using LMKit.Model;
using System;
using System.Collections.Generic;
// Retrieve training data
List<(string, Language)> trainingData = TextTranslation.GetTrainingData(
TextTranslation.TrainingDataset.LanguageDetection_LMKit2024_09_INT,
maxSamples: 500,
shuffle: true,
seed: 42);
// Use the training data as needed
foreach (var sample in trainingData)
{
Console.WriteLine($"Text: {sample.Item1}, Language: {sample.Item2}");
}
Exceptions
- ArgumentException
Thrown if the dataset is not recognized.