What is Temperature in Large Language Models?

TL;DR

Temperature is a parameter that controls the randomness of a language model's output. A low temperature (close to 0) makes the model more deterministic, always picking the highest-probability token. A high temperature (close to 1 or above) flattens the probability distribution, allowing less likely tokens to be selected, producing more creative and varied responses. In LM-Kit.NET, temperature is set through the RandomSampling class in the LMKit.TextGeneration.Sampling namespace, with a default value of 0.8.

What is Temperature?

Definition: Temperature is a scaling factor applied to the logits (raw prediction scores) of a language model before they are converted into a probability distribution via the softmax function. Mathematically:

P(token_i) = exp(logit_i / T) / sum(exp(logit_j / T))

Where T is the temperature. This single parameter has a profound effect on how the model selects the next token:

T → 0: The distribution becomes sharply peaked. The highest-probability token dominates. Output is nearly deterministic.
T = 1: The distribution matches the model's learned probabilities without modification.
T > 1: The distribution flattens. Lower-probability tokens gain a larger share. Output becomes more random, creative, and unpredictable.

Intuition

Imagine the model deciding its next word. It has a ranked list of candidates:

Temperature = 0.1 (nearly deterministic):
  "Paris"  → 98.5%
  "Lyon"   →  1.2%
  "Berlin" →  0.3%

Temperature = 1.0 (balanced):
  "Paris"  → 72.0%
  "Lyon"   → 18.0%
  "Berlin" → 10.0%

Temperature = 2.0 (high creativity):
  "Paris"  → 40.0%
  "Lyon"   → 32.0%
  "Berlin" → 28.0%

At low temperature, the model almost always picks "Paris." At high temperature, it frequently explores alternatives like "Lyon" or "Berlin," which may be creative in a storytelling context but wrong in a factual one.

Temperature and Task Types

Choosing the right temperature depends on what you need from the model:

Task	Recommended Temperature	Why
Factual Q&A	0.0 - 0.3	Accuracy matters more than variety
Code generation	0.0 - 0.2	Code must be syntactically and logically correct
Data extraction	0.0 - 0.1	Output must match schema precisely
General conversation	0.6 - 0.8	Balance between coherence and naturalness
Creative writing	0.8 - 1.2	Encourage novel phrasing and ideas
Brainstorming	1.0 - 1.5	Maximize diversity of suggestions

The Creativity-Accuracy Trade-Off

Low temperature                              High temperature
(deterministic)                              (creative)
     |                                              |
     v                                              v
+----------+    +----------+    +----------+    +----------+
| Accurate |    | Reliable |    | Varied   |    | Creative |
| Repetitive|   | Balanced |    | Natural  |    | Risky    |
| Safe     |    |          |    |          |    | Novel    |
+----------+    +----------+    +----------+    +----------+
   T ≈ 0         T ≈ 0.3        T ≈ 0.8         T ≈ 1.5

Higher temperature increases the risk of hallucination because the model is more likely to select low-probability tokens that lead to incorrect or fabricated content.

Temperature and Other Sampling Parameters

Temperature does not work in isolation. It is one stage in a multi-step sampling pipeline. In LM-Kit.NET's RandomSampling class, the pipeline processes logits through a configurable sequence of samplers:

Sampler	What It Does	Interaction with Temperature
Top-K	Keep only the K most probable tokens	Reduces the candidate pool before temperature is applied
Top-P (Nucleus)	Keep tokens whose cumulative probability reaches P	Dynamic candidate filtering, works with temperature
Min-P	Remove tokens below a minimum probability threshold	Cuts the tail after temperature scaling
Temperature	Scale logits to control randomness	Core randomness control
Locally Typical	Prefer tokens with typical (expected) information content	Complements temperature by filtering outliers

The default sampler sequence in LM-Kit.NET is: Top-K → Tail-Free → Locally Typical → Top-P → Min-P → Temperature.

Dynamic Temperature

LM-Kit.NET supports dynamic temperature, where the temperature varies based on the entropy of the logit distribution at each token position:

High-entropy tokens (model is uncertain): temperature is lowered to keep output coherent
Low-entropy tokens (model is confident): temperature is raised to allow natural variation

This is controlled by the DynamicTemperatureRange property on RandomSampling. A value of 0 disables it (default).

Practical Application in LM-Kit.NET SDK

Sampling Classes

LM-Kit.NET provides several sampling strategies, all extending the abstract TokenSampling base class:

Class	Temperature	Description
`RandomSampling`	0.8 (default)	Standard sampling with Top-K, Top-P, Min-P, and temperature
`GreedyDecoding`	N/A (always 0)	Always picks the highest-probability token. Fully deterministic.
`MirostatSampling`	0.8 (default)	Perplexity-targeting algorithm that dynamically adjusts token selection
`Mirostat2Sampling`	0.8 (default)	Improved variant of Mirostat
`TopNSigmaSampling`	0.8 (default)	Statistical outlier filtering, optimized for reasoning tasks

Code Example

Setting Temperature on a Conversation

using LMKit.Model;
using LMKit.TextGeneration;
using LMKit.TextGeneration.Sampling;

var model = LM.LoadFromModelID("gemma4:e4b");
using var chat = new MultiTurnConversation(model);

// Low temperature for factual tasks
chat.SamplingMode = new RandomSampling
{
    Temperature = 0.1f,
    TopK = 40,
    TopP = 0.95f
};

var factualAnswer = await chat.SubmitAsync("What is the capital of France?");

Creative Writing with High Temperature

// High temperature for creative tasks
chat.SamplingMode = new RandomSampling
{
    Temperature = 1.2f,
    TopK = 100,
    TopP = 0.98f,
    MinP = 0.02f
};

var story = await chat.SubmitAsync("Write the opening line of a mystery novel.");

Deterministic Output with Greedy Decoding

// Zero temperature equivalent: always pick the most likely token
chat.SamplingMode = new GreedyDecoding();

var deterministic = await chat.SubmitAsync("Translate 'hello' to French.");
// Always produces exactly the same output for the same input

Dynamic Temperature

// Let temperature adapt to model confidence at each token
chat.SamplingMode = new RandomSampling
{
    Temperature = 0.8f,
    DynamicTemperatureRange = 0.3f  // Temperature varies between 0.5 and 1.1
};

Reproducible Output with Seed

// Same temperature + same seed = same output across runs
chat.SamplingMode = new RandomSampling
{
    Temperature = 0.7f,
    Seed = 42
};

Key Terms

Temperature: A scaling factor applied to logits before softmax, controlling the randomness of token selection.
Logits: The raw, unnormalized prediction scores output by the model for each token in the vocabulary.
Softmax: The function that converts logits into a probability distribution summing to 1.
Greedy Decoding: Selecting the highest-probability token at every step (equivalent to temperature = 0).
Top-K Sampling: Restricting token selection to the K highest-probability candidates.
Top-P (Nucleus) Sampling: Restricting token selection to the smallest set of tokens whose cumulative probability exceeds P.
Dynamic Temperature: Automatically varying temperature based on the entropy of the logit distribution at each token position.
Seed: A fixed random number generator seed for reproducible sampling results.

RandomSampling: Standard sampling with temperature, Top-K, Top-P, and Min-P
GreedyDecoding: Deterministic decoding (temperature = 0)
MirostatSampling: Perplexity-targeting sampling with temperature
TokenSampling: Abstract base class for all sampling strategies
RepetitionPenalty: Penalty parameters to reduce repetitive output

Sampling: The broader token selection process that temperature is part of
Dynamic Sampling: Advanced sampling that adapts strategies per token
Logits: The raw scores that temperature scales
Hallucination: Higher temperature increases hallucination risk
Inference: The generation process where sampling occurs
Perplexity: A measure of model uncertainty, related to temperature effects
Grammar Sampling: Constrained decoding that works alongside temperature
Chat Completion: The conversation mode where temperature is commonly tuned
Prompt Engineering: Prompt design and temperature work together to shape output quality

External Resources

The Curious Case of Neural Text Degeneration (Holtzman et al., 2020): Analysis of sampling strategies including temperature, introducing nucleus sampling
Mirostat: A Neural Text Decoding Algorithm (Basu et al., 2021): Perplexity-aware sampling that adapts dynamically
Efficient Entropy-Based Sampling (Beurerkellner et al., 2024): Adaptive sampling based on token-level entropy

Summary

Temperature is the single most important parameter for controlling the trade-off between accuracy and creativity in language model output. By scaling logits before the softmax function, temperature determines whether the model picks the safest, highest-probability token (low temperature) or explores less likely alternatives (high temperature). In LM-Kit.NET, temperature is configured through the RandomSampling class (default: 0.8), alongside complementary parameters like Top-K, Top-P, and Min-P. For deterministic output, GreedyDecoding provides a zero-temperature equivalent. The DynamicTemperatureRange property enables adaptive temperature that varies with model confidence at each token position, offering the best of both worlds: coherence where the model is uncertain, variety where it is confident.

Table of Contents

What is Temperature in Large Language Models?

TL;DR