The most surprising thing about LLM temperature and top-p isn’t that they control randomness, but that they shift the LLM from a deterministic, predictable generator to a probabilistic, creative one.

Let’s see this in action. Imagine we have a simple prompt: "The quick brown fox jumps over the lazy".

Without any controls (effectively temperature=1.0, top-p=1.0), the LLM will likely give us a very common completion:

dog.

Now, let’s introduce a bit of control. We’ll set temperature=0.5. This means the model will favor more probable tokens but still allow for some variation. The output might become:

fox.

A little more adventurous, but still quite sensible.

If we push temperature higher, say temperature=1.5, we start seeing more unexpected but potentially interesting results:

river.

Now, let’s look at top-p (also known as nucleus sampling). Instead of adjusting the probabilities of all tokens, top-p focuses on a subset. It selects the smallest set of tokens whose cumulative probability exceeds a given p value. Only tokens within this set are considered for sampling.

Let’s go back to our original prompt and try top-p=0.1. This is a very low value, meaning only the most probable tokens will be considered. The output will likely be very similar to the default or low-temperature case:

dog.

Now, let’s try a higher top-p, say top-p=0.9. This includes a wider range of tokens. If the probabilities of the top tokens are:

  • dog: 0.4
  • cat: 0.2
  • fox: 0.1
  • cow: 0.05
  • river: 0.03
  • cloud: 0.02
  • …and so on

With top-p=0.9, the model will consider dog, cat, fox, cow, river, cloud, and any other tokens that bring the cumulative probability up to 0.9. This allows for more diversity than a low top-p or low temperature. We might see:

cat.

or

fox.

or even, if the probabilities align just right and other tokens are included:

specter.

The key is that temperature rescales all token probabilities, making the distribution sharper (low temp) or flatter (high temp). top-p, on the other hand, truncates the distribution, keeping the most likely tokens and discarding the rest.

Often, you’ll use them together. A common pattern is to use a moderate temperature (e.g., 0.7) to slightly soften the distribution, and then a high top-p (e.g., 0.9) to ensure that the sampling still considers a reasonable set of diverse options. This balances creativity with coherence. For instance, with temperature=0.7 and top-p=0.9 on "The quick brown fox jumps over the lazy", we might get:

stream.

or

shadow.

The problem these parameters solve is the LLM’s inherent tendency towards the most statistically probable, which can lead to repetitive or bland output. By adjusting temperature and top-p, you directly influence the trade-off between determinism and creativity, allowing you to fine-tune the LLM’s response for specific tasks. For creative writing, you’d lean towards higher values; for factual recall, lower values.

When you set temperature to 0, it doesn’t necessarily make the output identical every time, as the underlying model might still have some internal stochasticity or be sensitive to subtle differences in input tokenization. However, it will deterministically pick the single highest probability token at each step, making it as predictable as possible.

The next concept you’ll encounter is how these parameters interact with the prompt itself, and how different model architectures might interpret probability distributions slightly differently.

Want structured learning?

Take the full Llm course →