Table of Contents

Method LoadTrainingDataFromText

Namespace
LMKit.Finetuning
Assembly
LM-Kit.NET.dll

LoadTrainingDataFromText(string, string, Encoding)

Loads a training dataset from a plain text file.
Reads a text file containing training data samples separated by a specified delimiter, processes the data, and prepares it for the training process.

public int LoadTrainingDataFromText(string trainDataPath, string sampleStart = "<SFT>", Encoding encoding = null)

Parameters

trainDataPath string

The file path to the plain text file containing the training data.

sampleStart string

The delimiter used to separate individual samples within the training dataset. The default value is "<SFT>".

encoding Encoding

The character encoding to be used when reading the file. If not specified, UTF-8 encoding will be used by default.

Returns

int

The number of loaded training samples.

Exceptions

FileNotFoundException

Thrown when the specified file path does not exist.

LoadTrainingDataFromText(Stream, string, Encoding)

Loads a training dataset from a plain text file. This method reads a text file containing training data samples, processes the data, and prepares it for the training process.

public int LoadTrainingDataFromText(Stream trainingData, string sampleStart = "<SFT>", Encoding encoding = null)

Parameters

trainingData Stream

The stream of the plain text file containing the training data.

sampleStart string

The delimiter used to separate individual samples within the training dataset. The default value is "<SFT>".

encoding Encoding

The character encoding to be used when reading the file. If not specified, UTF-8 encoding will be used by default.

Returns

int

The number of loaded training samples.

Exceptions

ArgumentException

Thrown when the training data stream is null or empty.

InvalidDataException

Thrown when the data format is invalid or the delimiter is not found.