What is Temperature in Large Language Models?
TL;DR
Temperature is a parameter that controls the randomness of a language model's output. A low temperature (close to 0) makes the model more deterministic, always picking the highest-probability token. A high temperature (close to 1 or above) flattens the probability distribution, allowing less likely tokens to be selected, producing more creative and varied responses. In LM-Kit.NET, temperature is set through the RandomSampling class in the LMKit.TextGeneration.Sampling namespace, with a default value of 0.8.
What is Temperature?
Definition: Temperature is a scaling factor applied to the logits (raw prediction scores) of a language model before they are converted into a probability distribution via the softmax function. Mathematically:
P(token_i) = exp(logit_i / T) / sum(exp(logit_j / T))
Where T is the temperature. This single parameter has a profound effect on how the model selects the next token:
- T → 0: The distribution becomes sharply peaked. The highest-probability token dominates. Output is nearly deterministic.
- T = 1: The distribution matches the model's learned probabilities without modification.
- T > 1: The distribution flattens. Lower-probability tokens gain a larger share. Output becomes more random, creative, and unpredictable.
Intuition
Imagine the model deciding its next word. It has a ranked list of candidates:
Temperature = 0.1 (nearly deterministic):
"Paris" → 98.5%
"Lyon" → 1.2%
"Berlin" → 0.3%
Temperature = 1.0 (balanced):
"Paris" → 72.0%
"Lyon" → 18.0%
"Berlin" → 10.0%
Temperature = 2.0 (high creativity):
"Paris" → 40.0%
"Lyon" → 32.0%
"Berlin" → 28.0%
At low temperature, the model almost always picks "Paris." At high temperature, it frequently explores alternatives like "Lyon" or "Berlin," which may be creative in a storytelling context but wrong in a factual one.
Temperature and Task Types
Choosing the right temperature depends on what you need from the model:
| Task | Recommended Temperature | Why |
|---|---|---|
| Factual Q&A | 0.0 - 0.3 | Accuracy matters more than variety |
| Code generation | 0.0 - 0.2 | Code must be syntactically and logically correct |
| Data extraction | 0.0 - 0.1 | Output must match schema precisely |
| General conversation | 0.6 - 0.8 | Balance between coherence and naturalness |
| Creative writing | 0.8 - 1.2 | Encourage novel phrasing and ideas |
| Brainstorming | 1.0 - 1.5 | Maximize diversity of suggestions |
The Creativity-Accuracy Trade-Off
Low temperature High temperature
(deterministic) (creative)
| |
v v
+----------+ +----------+ +----------+ +----------+
| Accurate | | Reliable | | Varied | | Creative |
| Repetitive| | Balanced | | Natural | | Risky |
| Safe | | | | | | Novel |
+----------+ +----------+ +----------+ +----------+
T ≈ 0 T ≈ 0.3 T ≈ 0.8 T ≈ 1.5
Higher temperature increases the risk of hallucination because the model is more likely to select low-probability tokens that lead to incorrect or fabricated content.
Temperature and Other Sampling Parameters
Temperature does not work in isolation. It is one stage in a multi-step sampling pipeline. In LM-Kit.NET's RandomSampling class, the pipeline processes logits through a configurable sequence of samplers:
| Sampler | What It Does | Interaction with Temperature |
|---|---|---|
| Top-K | Keep only the K most probable tokens | Reduces the candidate pool before temperature is applied |
| Top-P (Nucleus) | Keep tokens whose cumulative probability reaches P | Dynamic candidate filtering, works with temperature |
| Min-P | Remove tokens below a minimum probability threshold | Cuts the tail after temperature scaling |
| Temperature | Scale logits to control randomness | Core randomness control |
| Locally Typical | Prefer tokens with typical (expected) information content | Complements temperature by filtering outliers |
The default sampler sequence in LM-Kit.NET is: Top-K → Tail-Free → Locally Typical → Top-P → Min-P → Temperature.
Dynamic Temperature
LM-Kit.NET supports dynamic temperature, where the temperature varies based on the entropy of the logit distribution at each token position:
- High-entropy tokens (model is uncertain): temperature is lowered to keep output coherent
- Low-entropy tokens (model is confident): temperature is raised to allow natural variation
This is controlled by the DynamicTemperatureRange property on RandomSampling. A value of 0 disables it (default).
Practical Application in LM-Kit.NET SDK
Sampling Classes
LM-Kit.NET provides several sampling strategies, all extending the abstract TokenSampling base class:
| Class | Temperature | Description |
|---|---|---|
RandomSampling |
0.8 (default) | Standard sampling with Top-K, Top-P, Min-P, and temperature |
GreedyDecoding |
N/A (always 0) | Always picks the highest-probability token. Fully deterministic. |
MirostatSampling |
0.8 (default) | Perplexity-targeting algorithm that dynamically adjusts token selection |
Mirostat2Sampling |
0.8 (default) | Improved variant of Mirostat |
TopNSigmaSampling |
0.8 (default) | Statistical outlier filtering, optimized for reasoning tasks |
Code Example
Setting Temperature on a Conversation
using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;
var model = LM.LoadFromModelID("gemma3:12b");
using var chat = new MultiTurnConversation(model);
// Low temperature for factual tasks
chat.SamplingMode = new RandomSampling
{
Temperature = 0.1f,
TopK = 40,
TopP = 0.95f
};
var factualAnswer = await chat.SubmitAsync("What is the capital of France?");
Creative Writing with High Temperature
// High temperature for creative tasks
chat.SamplingMode = new RandomSampling
{
Temperature = 1.2f,
TopK = 100,
TopP = 0.98f,
MinP = 0.02f
};
var story = await chat.SubmitAsync("Write the opening line of a mystery novel.");
Deterministic Output with Greedy Decoding
// Zero temperature equivalent: always pick the most likely token
chat.SamplingMode = new GreedyDecoding();
var deterministic = await chat.SubmitAsync("Translate 'hello' to French.");
// Always produces exactly the same output for the same input
Dynamic Temperature
// Let temperature adapt to model confidence at each token
chat.SamplingMode = new RandomSampling
{
Temperature = 0.8f,
DynamicTemperatureRange = 0.3f // Temperature varies between 0.5 and 1.1
};
Reproducible Output with Seed
// Same temperature + same seed = same output across runs
chat.SamplingMode = new RandomSampling
{
Temperature = 0.7f,
Seed = 42
};
Key Terms
- Temperature: A scaling factor applied to logits before softmax, controlling the randomness of token selection.
- Logits: The raw, unnormalized prediction scores output by the model for each token in the vocabulary.
- Softmax: The function that converts logits into a probability distribution summing to 1.
- Greedy Decoding: Selecting the highest-probability token at every step (equivalent to temperature = 0).
- Top-K Sampling: Restricting token selection to the K highest-probability candidates.
- Top-P (Nucleus) Sampling: Restricting token selection to the smallest set of tokens whose cumulative probability exceeds P.
- Dynamic Temperature: Automatically varying temperature based on the entropy of the logit distribution at each token position.
- Seed: A fixed random number generator seed for reproducible sampling results.
Related API Documentation
RandomSampling: Standard sampling with temperature, Top-K, Top-P, and Min-PGreedyDecoding: Deterministic decoding (temperature = 0)MirostatSampling: Perplexity-targeting sampling with temperatureTokenSampling: Abstract base class for all sampling strategiesRepetitionPenalty: Penalty parameters to reduce repetitive output
Related Glossary Topics
- Sampling: The broader token selection process that temperature is part of
- Dynamic Sampling: Advanced sampling that adapts strategies per token
- Logits: The raw scores that temperature scales
- Hallucination: Higher temperature increases hallucination risk
- Inference: The generation process where sampling occurs
- Perplexity: A measure of model uncertainty, related to temperature effects
- Grammar Sampling: Constrained decoding that works alongside temperature
- Chat Completion: The conversation mode where temperature is commonly tuned
- Prompt Engineering: Prompt design and temperature work together to shape output quality
External Resources
- The Curious Case of Neural Text Degeneration (Holtzman et al., 2020): Analysis of sampling strategies including temperature, introducing nucleus sampling
- Mirostat: A Neural Text Decoding Algorithm (Basu et al., 2021): Perplexity-aware sampling that adapts dynamically
- Efficient Entropy-Based Sampling (Beurerkellner et al., 2024): Adaptive sampling based on token-level entropy
Summary
Temperature is the single most important parameter for controlling the trade-off between accuracy and creativity in language model output. By scaling logits before the softmax function, temperature determines whether the model picks the safest, highest-probability token (low temperature) or explores less likely alternatives (high temperature). In LM-Kit.NET, temperature is configured through the RandomSampling class (default: 0.8), alongside complementary parameters like Top-K, Top-P, and Min-P. For deterministic output, GreedyDecoding provides a zero-temperature equivalent. The DynamicTemperatureRange property enables adaptive temperature that varies with model confidence at each token position, offering the best of both worlds: coherence where the model is uncertain, variety where it is confident.