Table of Contents

Namespace LMKit.TextGeneration.Sampling

Classes

Grammar

Represents a grammar used in text generation models to define and enforce grammar rules during the generation process. The Grammar class enables constrained output generation by specifying the allowed syntax and structure of the generated text. This allows developers to ensure that the model's output adheres to a predefined format, such as JSON, arithmetic expressions, or custom-defined grammars.

Benefits of using the Grammar class include:

  • Enforcing syntactic correctness in the generated output.
  • Restricting the output to a specific format or language.
  • Reducing the likelihood of invalid or nonsensical outputs.
  • Facilitating the extraction and parsing of generated data.
GreedyDecoding

Handles greedy decoding strategy.
This algorithm selects the token with the highest probability, ensuring complete determinism.

LogitBias

Handles the rules for applying logit bias during token sampling.
Logit bias enables the prevention of specific text chunks or provides guidance to increase or decrease the likelihood of a word appearing.

Mirostat2Sampling

Specifies the Mirostat sampling strategy in version 2, a neural text decoding algorithm that directly controls perplexity.
Mirostat is a sophisticated algorithm designed to proactively uphold the quality of generated text within a predefined range throughout the text generation process.
It endeavors to achieve a harmonious equilibrium between coherence and diversity, skillfully sidestepping the pitfalls of subpar output resulting from either excessive repetition, commonly referred to as "boredom traps," or lapses in coherence, known as "confusion traps."
The Mirostat algorithm is described in the paper https://arxiv.org/abs/2007.14966

MirostatSampling

Specifies the Mirostat sampling strategy, a neural text decoding algorithm that directly controls perplexity.
Mirostat is a sophisticated algorithm designed to proactively uphold the quality of generated text within a predefined range throughout the text generation process.
It endeavors to achieve a harmonious equilibrium between coherence and diversity, skillfully sidestepping the pitfalls of subpar output resulting from either excessive repetition, commonly referred to as "boredom traps," or lapses in coherence, known as "confusion traps."
The Mirostat algorithm is described in the paper https://arxiv.org/abs/2007.14966

RandomSampling

Handles random sampling strategy (also known as temperature-based sampling).

RepetitionPenalty

Handles the rules for repetition penalties applied during text completion.

TokenSampling

Handles the sampling strategy used during text completion.

Enums

Grammar.PredefinedGrammar

Defines the types of predefined grammar rules available for use in text generation.

LogitBiasSetMode

Defines the modes for updating bias values in a bias configuration.

RandomSampling.RandomSamplers

Provides a comprehensive overview of the assortment of samplers available for implementation within the RandomSampling strategy, each offering unique selection mechanisms.