Control Token Sampling with Dynamic Strategies

Every token a language model generates is chosen from a probability distribution over its entire vocabulary. The sampling strategy determines how that choice is made: should the model always pick the most likely token (deterministic), explore unlikely tokens (creative), or something in between? LM-Kit.NET gives you full control over this process through RandomSampling, RepetitionPenalty, LogitBias, and raw logit manipulation via BeforeTokenSampling and AfterTokenSampling events. This guide builds a sampling control system that adapts strategy per prompt, manipulates token probabilities in real time, and implements custom guardrails at the token level.

Why Sampling Control Matters

Two production problems that sampling control solves:

Deterministic output for data extraction and classification. When extracting structured data or classifying text, you need the model to produce the same output for the same input every time. Setting temperature near zero with a fixed seed eliminates randomness entirely, turning the LLM into a deterministic function. This is critical for testing, auditing, and reproducible pipelines.
Preventing degenerate output patterns. Without repetition penalties, models fall into loops ("the the the the...") or repeat entire paragraphs verbatim. Without MinP filtering, they occasionally emit nonsensical low-probability tokens. Tuning these parameters eliminates pathological outputs while preserving natural language quality.

Prerequisites

Requirement	Minimum
.NET SDK	8.0+
VRAM	4+ GB
Disk	~3 GB free for model download

Step 1: Create the Project

dotnet new console -n SamplingControl
cd SamplingControl
dotnet add package LM-Kit.NET

Step 2: Configure RandomSampling Parameters

RandomSampling controls the core sampling behavior: temperature, top-p (nucleus), top-k, min-p, and the order in which these filters are applied:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;
using LMKit.TextGeneration.Events;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Deterministic sampling (near-zero temperature)
// ──────────────────────────────────────
var deterministicSampling = new RandomSampling
{
    Temperature = 0.01f,   // Near-zero: always picks the most likely token
    TopP = 1.0f,           // Disabled: no nucleus filtering
    TopK = 1,              // Only consider the single most likely token
    MinP = 0.0f,           // Disabled
    Seed = 42              // Fixed seed for reproducibility
};

var deterministicChat = new SingleTurnConversation(model)
{
    MaximumCompletionTokens = 256,
    SamplingMode = deterministicSampling
};

Console.WriteLine("── Deterministic Output ──");
string result1 = deterministicChat.Submit("Classify this text as positive, negative, or neutral: 'The product works but the UI is terrible.'").Completion;
string result2 = deterministicChat.Submit("Classify this text as positive, negative, or neutral: 'The product works but the UI is terrible.'").Completion;
Console.WriteLine($"Run 1: {result1}");
Console.WriteLine($"Run 2: {result2}");
Console.WriteLine($"Identical: {result1 == result2}");

// ──────────────────────────────────────
// 3. Creative sampling (high temperature)
// ──────────────────────────────────────
var creativeSampling = new RandomSampling
{
    Temperature = 0.9f,    // High: explores diverse tokens
    TopP = 0.95f,          // Nucleus: keeps top 95% probability mass
    TopK = 80,             // Consider top 80 tokens
    MinP = 0.02f,          // Filter out tokens below 2% of the top token's probability
};

var creativeChat = new SingleTurnConversation(model)
{
    MaximumCompletionTokens = 512,
    SamplingMode = creativeSampling
};

Console.WriteLine("\n── Creative Output ──");
string story = creativeChat.Submit("Write a one-paragraph opening for a mystery novel set in a space station.").Completion;
Console.WriteLine(story);

Parameter Reference

Parameter	Range	Default	Effect
`Temperature`	0.0 to 1.0	0.8	Lower = more deterministic; higher = more creative
`TopP`	0.0 to 1.0	0.95	Nucleus sampling: keeps tokens whose cumulative probability reaches this threshold
`TopK`	1 to 1000	40	Only consider the top K most likely tokens
`MinP`	0.0 to 1.0	0.05	Filter tokens whose probability is below this fraction of the top token
`Seed`	uint or null	null	Fixed seed for reproducible output
`DynamicTemperatureRange`	0.0 to 1.0	0.0	Adjusts temperature dynamically based on token entropy

Step 3: Control Sampler Ordering

The order in which sampling filters are applied changes the result. LM-Kit.NET lets you specify the exact sequence:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;
using LMKit.TextGeneration.Events;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 4. Custom sampler pipeline order
// ──────────────────────────────────────
var customOrder = new RandomSampling
{
    Temperature = 0.7f,
    TopK = 50,
    TopP = 0.9f,
    MinP = 0.05f,
    // Apply filters in this exact order:
    SamplersSequence = new[]
    {
        RandomSampling.RandomSamplers.TopK,          // First: narrow to top 50 candidates
        RandomSampling.RandomSamplers.MinP,          // Then: remove low-probability outliers
        RandomSampling.RandomSamplers.TopP,          // Then: nucleus filter on remaining
        RandomSampling.RandomSamplers.Temperature,   // Finally: apply temperature scaling
    }
};

var orderedChat = new SingleTurnConversation(model)
{
    MaximumCompletionTokens = 256,
    SamplingMode = customOrder
};

Console.WriteLine("\n── Custom Sampler Order ──");
string orderedResult = orderedChat.Submit("Suggest 3 creative names for a coffee shop in Tokyo.").Completion;
Console.WriteLine(orderedResult);

The default order is TopK → TailFree → LocallyTypical → TopP → MinP → Temperature. Changing the order lets you fine-tune how aggressively the candidate set is filtered before temperature scaling.

Step 4: Prevent Repetition with RepetitionPenalty

RepetitionPenalty penalizes tokens that have already appeared in the generated text, preventing the model from looping or repeating phrases:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;
using LMKit.TextGeneration.Events;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 5. Repetition penalty configuration
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant.",
    MaximumCompletionTokens = 512,
    SamplingMode = new RandomSampling { Temperature = 0.7f }
};

// Configure repetition penalty
chat.RepetitionPenalty.TokenCount = 256;         // Look back 256 tokens for repeats
chat.RepetitionPenalty.RepeatPenalty = 1.15f;    // Penalize repeated tokens by 15%
chat.RepetitionPenalty.FrequencyPenalty = 0.1f;  // Additional penalty based on frequency
chat.RepetitionPenalty.PresencePenalty = 0.1f;   // Additional penalty for any token that appeared at all

chat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

Console.WriteLine("\n── Repetition-Controlled Output ──\n");
Console.Write("Assistant: ");
chat.Submit("Write a detailed explanation of how neural networks learn through backpropagation.");
Console.WriteLine("\n");

RepetitionPenalty Parameters

Parameter	Range	Default	Effect
`TokenCount`	0 to 2048	0 (disabled)	How many previous tokens to check for repeats
`RepeatPenalty`	0.0 to 2.0	1.1	Multiplicative penalty for repeated tokens
`FrequencyPenalty`	-2.0 to 2.0	0.0	Additive penalty proportional to token frequency
`PresencePenalty`	-2.0 to 2.0	0.0	Additive penalty for any previously seen token

Step 5: Steer Output with LogitBias

LogitBias lets you boost or suppress specific words and phrases before sampling. This is powerful for steering the model toward (or away from) specific terminology without changing the prompt:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;
using LMKit.TextGeneration.Events;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 6. Steer output with logit biases
// ──────────────────────────────────────
var biasChat = new SingleTurnConversation(model)
{
    MaximumCompletionTokens = 256,
    SamplingMode = new RandomSampling { Temperature = 0.7f }
};

// Boost certain terms
biasChat.LogitBias.AddTextChunkBias("sustainable", 3.0f);
biasChat.LogitBias.AddTextChunkBias("eco-friendly", 3.0f);
biasChat.LogitBias.AddTextChunkBias("renewable", 2.5f);

// Suppress certain terms
biasChat.LogitBias.AddTextChunkBias("cheap", -5.0f);
biasChat.LogitBias.AddTextChunkBias("expensive", -5.0f);

Console.WriteLine("── Biased Output (sustainability focus) ──");
string biasedResult = biasChat.Submit("Describe the benefits of solar energy for homeowners.").Completion;
Console.WriteLine(biasedResult);

// ──────────────────────────────────────
// 7. Disallow specific text chunks entirely
// ──────────────────────────────────────
var restrictedChat = new SingleTurnConversation(model)
{
    MaximumCompletionTokens = 256,
    SamplingMode = new RandomSampling { Temperature = 0.7f }
};

// Block competitor names from appearing in output
restrictedChat.LogitBias.DisallowTextChunk("CompetitorBrand");
restrictedChat.LogitBias.DisallowTextChunk("RivalProduct");

Console.WriteLine("\n── Restricted Output (blocked terms) ──");
string restrictedResult = restrictedChat.Submit("What are the best tools for project management?").Completion;
Console.WriteLine(restrictedResult);

// ──────────────────────────────────────
// 8. Allow only numeric/letter output
// ──────────────────────────────────────
var numericChat = new SingleTurnConversation(model)
{
    MaximumCompletionTokens = 32,
    SamplingMode = new RandomSampling { Temperature = 0.3f }
};

numericChat.LogitBias.AllowOnlyIntegerTextChunks(allowSpacing: true);

Console.WriteLine("\n── Numeric-Only Output ──");
string numericResult = numericChat.Submit("What is 42 * 17?").Completion;
Console.WriteLine($"Result: {numericResult}");

Step 6: Manipulate Logits in Real Time

For the ultimate control, hook into the BeforeTokenSampling and AfterTokenSampling events to inspect and modify the raw probability distribution at every token generation step:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;
using LMKit.TextGeneration.Events;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 9. Real-time logit manipulation
// ──────────────────────────────────────
var advancedChat = new MultiTurnConversation(model)
{
    SystemPrompt = "You are a helpful assistant.",
    MaximumCompletionTokens = 256,
    SamplingMode = new RandomSampling { Temperature = 0.7f }
};

// Track generation metrics
var perplexities = new List<double>();
var tokenProbabilities = new List<float>();

advancedChat.BeforeTokenSampling += (sender, e) =>
{
    // Record the perplexity at each step
    perplexities.Add(e.Perplexity);

    // Example: if perplexity is very high (model is uncertain),
    // we could stop generation early
    if (e.Perplexity > 100.0 && e.GeneratedTokens.Count > 50)
    {
        Console.ForegroundColor = ConsoleColor.Yellow;
        Console.WriteLine($"\n  [High uncertainty detected: perplexity {e.Perplexity:F1}]");
        Console.ResetColor();
        // Uncomment to stop: e.Stop = true;
    }
};

advancedChat.AfterTokenSampling += (sender, e) =>
{
    // Record the probability of each chosen token
    tokenProbabilities.Add(e.TokenProbability);

    // Example: replace very low-probability tokens
    // This acts as a safety net against hallucinations
    if (e.TokenProbability < 0.001f && e.ContextRemainingSpace > 10)
    {
        // Get the highest-probability alternative
        int topToken = e.GetTokenCandidateByRank(0);
        float topProb = e.GetTokenCandidateProbabilityByRank(0);

        // Optionally override the sampled token
        // e.Token = topToken;
    }
};

advancedChat.AfterTextCompletion += (_, e) =>
{
    if (e.SegmentType == TextSegmentType.UserVisible)
        Console.Write(e.Text);
};

Console.Write("\n── Real-Time Logit Analysis ──\nAssistant: ");
advancedChat.Submit("Explain three key principles of clean code architecture.");
Console.WriteLine();

// Display generation statistics
if (perplexities.Count > 0)
{
    Console.ForegroundColor = ConsoleColor.Cyan;
    Console.WriteLine($"\n── Generation Statistics ──");
    Console.WriteLine($"  Tokens generated: {perplexities.Count}");
    Console.WriteLine($"  Avg perplexity: {perplexities.Average():F2}");
    Console.WriteLine($"  Max perplexity: {perplexities.Max():F2}");
    Console.WriteLine($"  Avg token probability: {tokenProbabilities.Average():F4}");
    Console.WriteLine($"  Min token probability: {tokenProbabilities.Min():F6}");
    Console.ResetColor();
}

Step 7: Build Adaptive Sampling Per Prompt

Combine everything into a system that automatically selects the right sampling strategy based on the task type:

using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;
using LMKit.TextGeneration.Events;
using LMKit.TextGeneration.Chat;

LMKit.Licensing.LicenseManager.SetLicenseKey("");

Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;

// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3.5:4b",
    loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}%   "); return true; });
Console.WriteLine("\n");

// ──────────────────────────────────────
// 2. Deterministic sampling (near-zero temperature)
// ──────────────────────────────────────
var deterministicSampling = new RandomSampling
{
    Temperature = 0.01f,   // Near-zero: always picks the most likely token
    TopP = 1.0f,           // Disabled: no nucleus filtering
    TopK = 1,              // Only consider the single most likely token
    MinP = 0.0f,           // Disabled
    Seed = 42              // Fixed seed for reproducibility
};

var deterministicChat = new SingleTurnConversation(model)
{
    MaximumCompletionTokens = 256,
    SamplingMode = deterministicSampling
};

Console.WriteLine("── Deterministic Output ──");
string result1 = deterministicChat.Submit("Classify this text as positive, negative, or neutral: 'The product works but the UI is terrible.'").Completion;
string result2 = deterministicChat.Submit("Classify this text as positive, negative, or neutral: 'The product works but the UI is terrible.'").Completion;
Console.WriteLine($"Run 1: {result1}");
Console.WriteLine($"Run 2: {result2}");
Console.WriteLine($"Identical: {result1 == result2}");

// ──────────────────────────────────────
// 3. Creative sampling (high temperature)
// ──────────────────────────────────────
var creativeSampling = new RandomSampling
{
    Temperature = 0.9f,    // High: explores diverse tokens
    TopP = 0.95f,          // Nucleus: keeps top 95% probability mass
    TopK = 80,             // Consider top 80 tokens
    MinP = 0.02f,          // Filter out tokens below 2% of the top token's probability
};

var creativeChat = new SingleTurnConversation(model)
{
    MaximumCompletionTokens = 512,
    SamplingMode = creativeSampling
};

Console.WriteLine("\n── Creative Output ──");
string story = creativeChat.Submit("Write a one-paragraph opening for a mystery novel set in a space station.").Completion;

// ──────────────────────────────────────
// 10. Adaptive sampling strategy selector
// ──────────────────────────────────────
static RandomSampling SelectSamplingStrategy(string prompt)
{
    string lower = prompt.ToLowerInvariant();

    // Classification and extraction: deterministic
    if (lower.Contains("classify") || lower.Contains("extract") ||
        lower.Contains("categorize") || lower.Contains("label"))
    {
        return new RandomSampling
        {
            Temperature = 0.05f,
            TopK = 1,
            TopP = 1.0f,
            Seed = 42
        };
    }

    // Creative writing: high exploration
    if (lower.Contains("write") || lower.Contains("story") ||
        lower.Contains("creative") || lower.Contains("poem"))
    {
        return new RandomSampling
        {
            Temperature = 0.85f,
            TopP = 0.92f,
            TopK = 80,
            MinP = 0.02f
        };
    }

    // Code generation: moderate precision
    if (lower.Contains("code") || lower.Contains("implement") ||
        lower.Contains("function") || lower.Contains("program"))
    {
        return new RandomSampling
        {
            Temperature = 0.3f,
            TopP = 0.95f,
            TopK = 40,
            MinP = 0.05f
        };
    }

    // Default: balanced
    return new RandomSampling
    {
        Temperature = 0.7f,
        TopP = 0.9f,
        TopK = 50,
        MinP = 0.05f
    };
}

// Test adaptive sampling
string[] prompts = {
    "Classify this text as spam or not-spam: 'You won a free iPhone! Click here!'",
    "Write a short poem about autumn leaves falling in a quiet forest.",
    "Implement a C# method that finds the longest palindrome in a string.",
    "What are the three branches of the US government?"
};

foreach (string prompt in prompts)
{
    var sampling = SelectSamplingStrategy(prompt);
    var adaptiveChat = new SingleTurnConversation(model)
    {
        MaximumCompletionTokens = 256,
        SamplingMode = sampling
    };

    Console.ForegroundColor = ConsoleColor.DarkGray;
    Console.WriteLine($"\n[Strategy: T={sampling.Temperature}, TopK={sampling.TopK}, TopP={sampling.TopP}]");
    Console.ResetColor();
    Console.WriteLine($"Q: {prompt}");
    Console.WriteLine($"A: {adaptiveChat.Submit(prompt).Completion}\n");
}

Common Issues

Problem	Cause	Fix
Output is always identical	Temperature at 0 with TopK=1	Increase temperature or TopK for variety; this is correct behavior for deterministic mode
Output contains repetitive loops	No repetition penalty configured	Set `RepetitionPenalty.TokenCount` to 128+ and `RepeatPenalty` to 1.1 or higher
Output is incoherent or nonsensical	Temperature too high, or MinP too low	Reduce temperature below 0.9; set MinP to 0.05+ to filter low-probability tokens
`BeforeTokenSampling` not firing	Event handler attached after first Submit	Attach event handlers before calling `Submit`
`LogitBias` has no visible effect	Bias weight too small	Use values of 3.0+ for noticeable boosting, or -5.0 or lower for suppression
`Seed` does not produce identical output	Other parameters introduce randomness	Ensure TopK=1 or Temperature near 0 alongside the seed

Next Steps

Enforce Structured Output with Grammar-Constrained Decoding: combine sampling control with grammar constraints for deterministic, schema-compliant output.
Stream Agent Responses in Real Time: watch sampling decisions unfold token by token.
Build a Content Moderation Filter: use logit biases and token-level interception for content safety.
Handle Long Inputs with Overflow Policies: manage context length alongside sampling configuration.

Table of Contents