Control Token Sampling with Dynamic Strategies
Every token a language model generates is chosen from a probability distribution over its entire vocabulary. The sampling strategy determines how that choice is made: should the model always pick the most likely token (deterministic), explore unlikely tokens (creative), or something in between? LM-Kit.NET gives you full control over this process through RandomSampling, RepetitionPenalty, LogitBias, and raw logit manipulation via BeforeTokenSampling and AfterTokenSampling events. This guide builds a sampling control system that adapts strategy per prompt, manipulates token probabilities in real time, and implements custom guardrails at the token level.
Why Sampling Control Matters
Two production problems that sampling control solves:
- Deterministic output for data extraction and classification. When extracting structured data or classifying text, you need the model to produce the same output for the same input every time. Setting temperature near zero with a fixed seed eliminates randomness entirely, turning the LLM into a deterministic function. This is critical for testing, auditing, and reproducible pipelines.
- Preventing degenerate output patterns. Without repetition penalties, models fall into loops ("the the the the...") or repeat entire paragraphs verbatim. Without MinP filtering, they occasionally emit nonsensical low-probability tokens. Tuning these parameters eliminates pathological outputs while preserving natural language quality.
Prerequisites
| Requirement | Minimum |
|---|---|
| .NET SDK | 8.0+ |
| VRAM | 4+ GB |
| Disk | ~3 GB free for model download |
Step 1: Create the Project
dotnet new console -n SamplingControl
cd SamplingControl
dotnet add package LM-Kit.NET
Step 2: Configure RandomSampling Parameters
RandomSampling controls the core sampling behavior: temperature, top-p (nucleus), top-k, min-p, and the order in which these filters are applied:
using System.Text;
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;
using LMKit.TextGeneration.Events;
LMKit.Licensing.LicenseManager.SetLicenseKey("");
Console.InputEncoding = Encoding.UTF8;
Console.OutputEncoding = Encoding.UTF8;
// ──────────────────────────────────────
// 1. Load model
// ──────────────────────────────────────
Console.WriteLine("Loading model...");
using LM model = LM.LoadFromModelID("qwen3:4b",
loadingProgress: p => { Console.Write($"\rLoading: {p * 100:F0}% "); return true; });
Console.WriteLine("\n");
// ──────────────────────────────────────
// 2. Deterministic sampling (near-zero temperature)
// ──────────────────────────────────────
var deterministicSampling = new RandomSampling
{
Temperature = 0.01f, // Near-zero: always picks the most likely token
TopP = 1.0f, // Disabled: no nucleus filtering
TopK = 1, // Only consider the single most likely token
MinP = 0.0f, // Disabled
Seed = 42 // Fixed seed for reproducibility
};
var deterministicChat = new SingleTurnConversation(model)
{
MaximumCompletionTokens = 256,
SamplingMode = deterministicSampling
};
Console.WriteLine("── Deterministic Output ──");
string result1 = deterministicChat.Submit("Classify this text as positive, negative, or neutral: 'The product works but the UI is terrible.'");
string result2 = deterministicChat.Submit("Classify this text as positive, negative, or neutral: 'The product works but the UI is terrible.'");
Console.WriteLine($"Run 1: {result1}");
Console.WriteLine($"Run 2: {result2}");
Console.WriteLine($"Identical: {result1 == result2}");
// ──────────────────────────────────────
// 3. Creative sampling (high temperature)
// ──────────────────────────────────────
var creativeSampling = new RandomSampling
{
Temperature = 0.9f, // High: explores diverse tokens
TopP = 0.95f, // Nucleus: keeps top 95% probability mass
TopK = 80, // Consider top 80 tokens
MinP = 0.02f, // Filter out tokens below 2% of the top token's probability
};
var creativeChat = new SingleTurnConversation(model)
{
MaximumCompletionTokens = 512,
SamplingMode = creativeSampling
};
Console.WriteLine("\n── Creative Output ──");
string story = creativeChat.Submit("Write a one-paragraph opening for a mystery novel set in a space station.");
Console.WriteLine(story);
Parameter Reference
| Parameter | Range | Default | Effect |
|---|---|---|---|
Temperature |
0.0 to 1.0 | 0.8 | Lower = more deterministic; higher = more creative |
TopP |
0.0 to 1.0 | 0.95 | Nucleus sampling: keeps tokens whose cumulative probability reaches this threshold |
TopK |
1 to 1000 | 40 | Only consider the top K most likely tokens |
MinP |
0.0 to 1.0 | 0.05 | Filter tokens whose probability is below this fraction of the top token |
Seed |
uint or null | null | Fixed seed for reproducible output |
DynamicTemperatureRange |
0.0 to 1.0 | 0.0 | Adjusts temperature dynamically based on token entropy |
Step 3: Control Sampler Ordering
The order in which sampling filters are applied changes the result. LM-Kit.NET lets you specify the exact sequence:
// ──────────────────────────────────────
// 4. Custom sampler pipeline order
// ──────────────────────────────────────
var customOrder = new RandomSampling
{
Temperature = 0.7f,
TopK = 50,
TopP = 0.9f,
MinP = 0.05f,
// Apply filters in this exact order:
SamplersSequence = new[]
{
RandomSamplers.TopK, // First: narrow to top 50 candidates
RandomSamplers.MinP, // Then: remove low-probability outliers
RandomSamplers.TopP, // Then: nucleus filter on remaining
RandomSamplers.Temperature, // Finally: apply temperature scaling
}
};
var orderedChat = new SingleTurnConversation(model)
{
MaximumCompletionTokens = 256,
SamplingMode = customOrder
};
Console.WriteLine("\n── Custom Sampler Order ──");
string orderedResult = orderedChat.Submit("Suggest 3 creative names for a coffee shop in Tokyo.");
Console.WriteLine(orderedResult);
The default order is TopK → TailFree → LocallyTypical → TopP → MinP → Temperature. Changing the order lets you fine-tune how aggressively the candidate set is filtered before temperature scaling.
Step 4: Prevent Repetition with RepetitionPenalty
RepetitionPenalty penalizes tokens that have already appeared in the generated text, preventing the model from looping or repeating phrases:
// ──────────────────────────────────────
// 5. Repetition penalty configuration
// ──────────────────────────────────────
var chat = new MultiTurnConversation(model)
{
SystemPrompt = "You are a helpful assistant.",
MaximumCompletionTokens = 512,
SamplingMode = new RandomSampling { Temperature = 0.7f }
};
// Configure repetition penalty
chat.RepetitionPenalty.TokenCount = 256; // Look back 256 tokens for repeats
chat.RepetitionPenalty.RepeatPenalty = 1.15f; // Penalize repeated tokens by 15%
chat.RepetitionPenalty.FrequencyPenalty = 0.1f; // Additional penalty based on frequency
chat.RepetitionPenalty.PresencePenalty = 0.1f; // Additional penalty for any token that appeared at all
chat.AfterTextCompletion += (_, e) =>
{
if (e.SegmentType == TextSegmentType.UserVisible)
Console.Write(e.Text);
};
Console.WriteLine("\n── Repetition-Controlled Output ──\n");
Console.Write("Assistant: ");
chat.Submit("Write a detailed explanation of how neural networks learn through backpropagation.");
Console.WriteLine("\n");
RepetitionPenalty Parameters
| Parameter | Range | Default | Effect |
|---|---|---|---|
TokenCount |
0 to 2048 | 0 (disabled) | How many previous tokens to check for repeats |
RepeatPenalty |
0.0 to 2.0 | 1.1 | Multiplicative penalty for repeated tokens |
FrequencyPenalty |
-2.0 to 2.0 | 0.0 | Additive penalty proportional to token frequency |
PresencePenalty |
-2.0 to 2.0 | 0.0 | Additive penalty for any previously seen token |
Step 5: Steer Output with LogitBias
LogitBias lets you boost or suppress specific words and phrases before sampling. This is powerful for steering the model toward (or away from) specific terminology without changing the prompt:
// ──────────────────────────────────────
// 6. Steer output with logit biases
// ──────────────────────────────────────
var biasChat = new SingleTurnConversation(model)
{
MaximumCompletionTokens = 256,
SamplingMode = new RandomSampling { Temperature = 0.7f }
};
// Boost certain terms
biasChat.LogitBias.AddTextChunkBias("sustainable", 3.0f);
biasChat.LogitBias.AddTextChunkBias("eco-friendly", 3.0f);
biasChat.LogitBias.AddTextChunkBias("renewable", 2.5f);
// Suppress certain terms
biasChat.LogitBias.AddTextChunkBias("cheap", -5.0f);
biasChat.LogitBias.AddTextChunkBias("expensive", -5.0f);
Console.WriteLine("── Biased Output (sustainability focus) ──");
string biasedResult = biasChat.Submit("Describe the benefits of solar energy for homeowners.");
Console.WriteLine(biasedResult);
// ──────────────────────────────────────
// 7. Disallow specific text chunks entirely
// ──────────────────────────────────────
var restrictedChat = new SingleTurnConversation(model)
{
MaximumCompletionTokens = 256,
SamplingMode = new RandomSampling { Temperature = 0.7f }
};
// Block competitor names from appearing in output
restrictedChat.LogitBias.DisallowTextChunk("CompetitorBrand");
restrictedChat.LogitBias.DisallowTextChunk("RivalProduct");
Console.WriteLine("\n── Restricted Output (blocked terms) ──");
string restrictedResult = restrictedChat.Submit("What are the best tools for project management?");
Console.WriteLine(restrictedResult);
// ──────────────────────────────────────
// 8. Allow only numeric/letter output
// ──────────────────────────────────────
var numericChat = new SingleTurnConversation(model)
{
MaximumCompletionTokens = 32,
SamplingMode = new RandomSampling { Temperature = 0.3f }
};
numericChat.LogitBias.AllowOnlyIntegerTextChunks(allowSpacing: true);
Console.WriteLine("\n── Numeric-Only Output ──");
string numericResult = numericChat.Submit("What is 42 * 17?");
Console.WriteLine($"Result: {numericResult}");
Step 6: Manipulate Logits in Real Time
For the ultimate control, hook into the BeforeTokenSampling and AfterTokenSampling events to inspect and modify the raw probability distribution at every token generation step:
// ──────────────────────────────────────
// 9. Real-time logit manipulation
// ──────────────────────────────────────
var advancedChat = new MultiTurnConversation(model)
{
SystemPrompt = "You are a helpful assistant.",
MaximumCompletionTokens = 256,
SamplingMode = new RandomSampling { Temperature = 0.7f }
};
// Track generation metrics
var perplexities = new List<double>();
var tokenProbabilities = new List<float>();
advancedChat.BeforeTokenSampling += (sender, e) =>
{
// Record the perplexity at each step
perplexities.Add(e.Perplexity);
// Example: if perplexity is very high (model is uncertain),
// we could stop generation early
if (e.Perplexity > 100.0 && e.GeneratedTokens.Count > 50)
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"\n [High uncertainty detected: perplexity {e.Perplexity:F1}]");
Console.ResetColor();
// Uncomment to stop: e.Stop = true;
}
};
advancedChat.AfterTokenSampling += (sender, e) =>
{
// Record the probability of each chosen token
tokenProbabilities.Add(e.TokenProbability);
// Example: replace very low-probability tokens
// This acts as a safety net against hallucinations
if (e.TokenProbability < 0.001f && e.ContextRemainingSpace > 10)
{
// Get the highest-probability alternative
int topToken = e.GetTokenCandidateByRank(0);
float topProb = e.GetTokenCandidateProbabilityByRank(0);
// Optionally override the sampled token
// e.Token = topToken;
}
};
advancedChat.AfterTextCompletion += (_, e) =>
{
if (e.SegmentType == TextSegmentType.UserVisible)
Console.Write(e.Text);
};
Console.Write("\n── Real-Time Logit Analysis ──\nAssistant: ");
advancedChat.Submit("Explain three key principles of clean code architecture.");
Console.WriteLine();
// Display generation statistics
if (perplexities.Count > 0)
{
Console.ForegroundColor = ConsoleColor.Cyan;
Console.WriteLine($"\n── Generation Statistics ──");
Console.WriteLine($" Tokens generated: {perplexities.Count}");
Console.WriteLine($" Avg perplexity: {perplexities.Average():F2}");
Console.WriteLine($" Max perplexity: {perplexities.Max():F2}");
Console.WriteLine($" Avg token probability: {tokenProbabilities.Average():F4}");
Console.WriteLine($" Min token probability: {tokenProbabilities.Min():F6}");
Console.ResetColor();
}
Step 7: Build Adaptive Sampling Per Prompt
Combine everything into a system that automatically selects the right sampling strategy based on the task type:
// ──────────────────────────────────────
// 10. Adaptive sampling strategy selector
// ──────────────────────────────────────
static RandomSampling SelectSamplingStrategy(string prompt)
{
string lower = prompt.ToLowerInvariant();
// Classification and extraction: deterministic
if (lower.Contains("classify") || lower.Contains("extract") ||
lower.Contains("categorize") || lower.Contains("label"))
{
return new RandomSampling
{
Temperature = 0.05f,
TopK = 1,
TopP = 1.0f,
Seed = 42
};
}
// Creative writing: high exploration
if (lower.Contains("write") || lower.Contains("story") ||
lower.Contains("creative") || lower.Contains("poem"))
{
return new RandomSampling
{
Temperature = 0.85f,
TopP = 0.92f,
TopK = 80,
MinP = 0.02f
};
}
// Code generation: moderate precision
if (lower.Contains("code") || lower.Contains("implement") ||
lower.Contains("function") || lower.Contains("program"))
{
return new RandomSampling
{
Temperature = 0.3f,
TopP = 0.95f,
TopK = 40,
MinP = 0.05f
};
}
// Default: balanced
return new RandomSampling
{
Temperature = 0.7f,
TopP = 0.9f,
TopK = 50,
MinP = 0.05f
};
}
// Test adaptive sampling
string[] prompts = {
"Classify this text as spam or not-spam: 'You won a free iPhone! Click here!'",
"Write a short poem about autumn leaves falling in a quiet forest.",
"Implement a C# method that finds the longest palindrome in a string.",
"What are the three branches of the US government?"
};
foreach (string prompt in prompts)
{
var sampling = SelectSamplingStrategy(prompt);
var adaptiveChat = new SingleTurnConversation(model)
{
MaximumCompletionTokens = 256,
SamplingMode = sampling
};
Console.ForegroundColor = ConsoleColor.DarkGray;
Console.WriteLine($"\n[Strategy: T={sampling.Temperature}, TopK={sampling.TopK}, TopP={sampling.TopP}]");
Console.ResetColor();
Console.WriteLine($"Q: {prompt}");
Console.WriteLine($"A: {adaptiveChat.Submit(prompt)}\n");
}
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| Output is always identical | Temperature at 0 with TopK=1 | Increase temperature or TopK for variety; this is correct behavior for deterministic mode |
| Output contains repetitive loops | No repetition penalty configured | Set RepetitionPenalty.TokenCount to 128+ and RepeatPenalty to 1.1 or higher |
| Output is incoherent or nonsensical | Temperature too high, or MinP too low | Reduce temperature below 0.9; set MinP to 0.05+ to filter low-probability tokens |
BeforeTokenSampling not firing |
Event handler attached after first Submit | Attach event handlers before calling Submit |
LogitBias has no visible effect |
Bias weight too small | Use values of 3.0+ for noticeable boosting, or -5.0 or lower for suppression |
Seed does not produce identical output |
Other parameters introduce randomness | Ensure TopK=1 or Temperature near 0 alongside the seed |
Next Steps
- Enforce Structured Output with Grammar-Constrained Decoding: combine sampling control with grammar constraints for deterministic, schema-compliant output.
- Stream Agent Responses in Real Time: watch sampling decisions unfold token by token.
- Build a Content Moderation Filter: use logit biases and token-level interception for content safety.
- Handle Long Inputs with Overflow Policies: manage context length alongside sampling configuration.