Module 2 · How LLMs work 7 min read

Temperature: the AI's dice

Why the same question can get different answers. Turn the dial yourself.

The AI rolls dice

Remember how an LLM picks the next word? It doesn’t usually pick the most likely word, it picks a likely word, weighted by how likely it is. That’s a small dice roll.

Temperature is a number that controls how spicy that dice roll is.

  • Low temperature (near 0): always pick the safest, most boring word. Same answer every time.
  • Medium (around 0.7): a little variety. Feels human.
  • High (1.0 and up): chaos. Creative, weird, sometimes off the rails.

See it for yourself

Move the slider. Hit the button. The exact same question runs three times. Watch the answers stay the same at low temperature and explode at high temperature.

Same question, three tries

Give me one creative name for a friendly robot. Reply with just the name.

0 · boring same1.5 · wild different

Two more dials: top-k and top-p

Temperature changes how the dice are rolled. top-k and top-p change which words are even allowed in the bag before the roll.

  • top-k = “only consider the top k most likely words.” If top-k is 5, the model picks only from its 5 best candidates, no matter how many it was thinking about.
  • top-p (also called nucleus sampling) = “only consider the smallest group of words whose chances add up to p.” If top-p is 0.90, the model keeps just enough top candidates that together they cover 90% of the chance, and ignores the long tail.

In other words: temperature controls the shape of the dice, top-k and top-p control the size of the dice.

Why have both?

  • top-k is simple, but it doesn’t care about whether the top words are super confident or super uncertain. 5 candidates is 5 candidates.
  • top-p is smarter. If the model is very sure the next word is “Paris,” top-p shrinks the choice down to one almost automatically. If the model is unsure, top-p lets in more options.

Most AI apps use temperature plus top-p. Some also add top-k as a hard cap on the wildest tail.

Try it

Below is what the AI sees as the top 12 candidates for “The cat sat on the ___.” Slide the dials. Watch which words stay in the bag.

Prompt

"The cat sat on the ___"

Below are the 12 words the model thinks are most likely to come next, with their probabilities. See how each dial decides which words are still allowed before the dice roll.

only the top 6 candidates can be chosen
mat
32.0%
floor
16.0%
couch
12.0%
rug
9.0%
chair
7.0%
bed
6.0%
table
5.0%
lap
4.0%
sofa
3.0%
windowsill
2.5%
fence
1.5%
moon
0.5%
keep candidates until they together cover 75% of the chance
mat
32.0%
floor
16.0%
couch
12.0%
rug
9.0%
chair
7.0%
bed
6.0%
table
5.0%
lap
4.0%
sofa
3.0%
windowsill
2.5%
fence
1.5%
moon
0.5%

Try this: pull top-k all the way down to 1 (always picks "mat", never surprises you). Now pull top-p to 0.30, only the few most likely words survive. Crank both up and the long tail comes back, including weird picks like "moon."

Why this matters

  • When you want a fact, pick low temperature and a small top-p. Less wandering.
  • When you want a poem or brainstorming, pick a higher temperature and a wider top-p. More fun.
  • Temperature is not the same as “smartness.” A high-temperature answer isn’t smarter, just rarer.

A surprising side effect

Because of the dice roll, an LLM can give a confident answer that’s almost right but slightly wrong. It rolled a 5 when 6 was the truth. This is one of the things that makes hallucinations possible, coming up in Lesson 8.

Quick check

  1. 1. What does temperature control?
  2. 2. You want the AI to give you the capital of France. Best temperature?
  3. 3. What does top-k do?
  4. 4. What does top-p (nucleus sampling) do?
  5. 5. Why might the same question give two different answers?