Table of Contents

Method Tokenize

Namespace
LMKit.Tokenization
Assembly
LM-Kit.NET.dll

Tokenize(string)

Tokenizes the given text into an array of token identifiers. Special tokens are added and parsed based on the configuration.

public int[] Tokenize(string text)

Parameters

text string

The text to tokenize.

Returns

int[]

An array of integers where each entry represents a token identifier.

Examples

using LMKit.Model;
using System;

LM model = LM.LoadFromModelID("llama-3.2-1b");

int[] tokens = model.Vocabulary.Tokenize("Hello, world!");
Console.WriteLine($"Token count: {tokens.Length}");
Console.WriteLine($"Tokens: [{string.Join(", ", tokens)}]");