Temperature and top-p sampling control how creative or deterministic an LLM's output is. Understanding them helps you get consistent, predictable results.
LLMs don't pick the "correct" next word — they generate a probability distribution over all possible next tokens, then sample from it.
Input: "The capital of France is"
Token probabilities:
"Paris" → 94.2%
"Lyon" → 2.1%
"a" → 1.8%
"the" → 0.9%
...
Temperature and top-p control how you sample from this distribution.
Temperature scales the probability distribution before sampling. It's a number between 0 and 2.
| Temperature | Behaviour | Use when |
|---|---|---|
| 0 | Always picks highest probability token | Facts, code, structured data |
| 0.3 - 0.7 | Mostly predictable, slight variation | Summaries, Q&A, analysis |
| 0.7 - 1.0 | Balanced creativity | General chat, writing assistance |
| 1.0 - 2.0 | High creativity, less coherent | Brainstorming, creative writing |
from openai import OpenAI
client = OpenAI()
# Deterministic — same answer every time
response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
messages=[{"role": "user", "content": "What is 2+2?"}]
)
# Creative — different answer each time
response = client.chat.completions.create(
model="gpt-4o",
temperature=1.2,
messages=[{"role": "user", "content": "Write a poem about code"}]
)Top-p limits sampling to the smallest set of tokens whose cumulative probability exceeds p.
All tokens sorted by probability:
"Paris" → 94.2% ← cumulative: 94.2%
"Lyon" → 2.1% ← cumulative: 96.3%
"a" → 1.8% ← cumulative: 98.1% ← top-p=0.98 stops here
"the" → 0.9% ← excluded
...
With top_p=0.9, only tokens in the top 90% of probability mass are considered.
| Top-p | Behaviour |
|---|---|
| 0.1 | Very conservative — only top tokens |
| 0.9 | Balanced — default for most use cases |
| 1.0 | All tokens considered — no filtering |
Use temperature when you want to control the overall creativity level.
Use top-p when you want to prevent very unlikely tokens from appearing.
In practice: Most APIs let you set both. The recommendation from OpenAI and Anthropic is to alter one, not both — changing both simultaneously makes behaviour hard to predict.
# Recommended: set one, leave the other at default
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.7, # set this
top_p=1.0, # leave at default
messages=[...]
)| Use Case | Temperature | Top-p |
|---|---|---|
| Code generation | 0 - 0.2 | 1.0 |
| Factual Q&A | 0 - 0.3 | 1.0 |
| Summarization | 0.3 - 0.5 | 1.0 |
| Chatbot | 0.7 | 1.0 |
| Creative writing | 1.0 - 1.5 | 0.9 |
| Brainstorming | 1.2 - 1.5 | 0.95 |