Enum VocabularyMode
- Namespace
- LMKit.Tokenization
- Assembly
- LM-Kit.NET.dll
Specifies the vocabulary modes used by different tokenizer models.
public enum VocabularyMode
Fields
NONE = 0
No vocabulary mode is specified. Used for models without a vocabulary.
SPM = 1
Uses SentencePiece Model (SPM) vocabulary. This mode is based on byte-level Byte Pair Encoding (BPE) with byte fallback, typically used by LLaMA tokenizer.
BPE = 2
Uses Byte Pair Encoding (BPE) vocabulary. This mode is based on byte-level BPE, commonly used by GPT-2 tokenizer.
WPM = 3
Uses WordPiece Model (WPM) vocabulary. This mode is based on WordPiece, typically used by BERT tokenizer.