Namespace LMKit.TextGeneration.Sampling

Classes

Represents a grammar used in text generation models to define and enforce grammar rules during the generation process. The Grammar class enables constrained output generation by specifying the allowed syntax and structure of the generated text. This allows developers to ensure that the model's output adheres to a predefined format, such as JSON, arithmetic expressions, or custom-defined grammars.

Benefits of using the Grammar class include:

Enforcing syntactic correctness in the generated output.
Restricting the output to a specific format or language.
Reducing the likelihood of invalid or nonsensical outputs.
Facilitating the extraction and parsing of generated data.

GreedyDecoding: Handles greedy decoding strategy.
This algorithm selects the token with the highest probability, ensuring complete determinism.

LogitBias: Handles the rules for applying logit bias during token sampling.
Logit bias enables the prevention of specific text chunks or provides guidance to increase or decrease the likelihood of a word appearing.

Mirostat2Sampling: Specifies the Mirostat sampling strategy in version 2, a neural text decoding algorithm that directly controls perplexity.
Mirostat is a sophisticated algorithm designed to proactively uphold the quality of generated text within a predefined range throughout the text generation process.
It endeavors to achieve a harmonious equilibrium between coherence and diversity, skillfully sidestepping the pitfalls of subpar output resulting from either excessive repetition, commonly referred to as "boredom traps," or lapses in coherence, known as "confusion traps."
The Mirostat algorithm is described in the paper https://arxiv.org/abs/2007.14966

MirostatSampling: Specifies the Mirostat sampling strategy, a neural text decoding algorithm that directly controls perplexity.
Mirostat is a sophisticated algorithm designed to proactively uphold the quality of generated text within a predefined range throughout the text generation process.
It endeavors to achieve a harmonious equilibrium between coherence and diversity, skillfully sidestepping the pitfalls of subpar output resulting from either excessive repetition, commonly referred to as "boredom traps," or lapses in coherence, known as "confusion traps."
The Mirostat algorithm is described in the paper https://arxiv.org/abs/2007.14966

RandomSampling: Handles random sampling strategy (also known as temperature-based sampling).

RepetitionPenalty: Handles the rules for repetition penalties applied during text completion.

TokenSampling: Handles the sampling strategy used during text completion.

TopNSigmaSampling

Implements Top-nσ sampling, a text generation strategy introduced in “Top-nσ: Not All Logits Are You Need”.

This method limits the next token selection to those whose pre-softmax logits fall within n * σ of the maximum logit. By filtering the candidate tokens directly based on this statistical threshold, Top-nσ sampling ensures a stable sampling space that is robust to temperature scaling. This makes it particularly effective in reasoning tasks even when higher temperatures are used.

A higher TopNSigma value (e.g., 5) expands the sampling space to include more tokens (potentially noisier), whereas a lower value (e.g., 1) restricts the focus to the most competitive tokens.

Enums

Grammar.PredefinedGrammar: Defines the types of predefined grammar rules available for use in text generation.

LogitBiasSetMode: Defines the modes for updating bias values in a bias configuration.

RandomSampling.RandomSamplers: Provides a comprehensive overview of the assortment of samplers available for implementation within the RandomSampling strategy, each offering unique selection mechanisms.

Table of Contents

Namespace LMKit.TextGeneration.Sampling

Classes

Enums