Module 2 · How LLMs work 5 min read

Tokens: the puzzle pieces of language

AIs don't see letters or whole words. They see "tokens." Let's break some.

A weird truth

When you read this sentence, you see letters grouped into words.

When an AI reads it, it doesn’t see letters or words. It sees tokens: chunks that are sometimes whole words, sometimes pieces of words, and sometimes just a space and a comma.

Try it yourself

Type anything in the box. Each colored piece is one token, with its secret ID number underneath.

11 tokens
Hello13225!0 My3673 name1308 is382 Mia82691 and326 I357 love3047 space4918.13

Each colored piece is one token. The LLM sees your words as these numbered pieces, not as letters.

Things to notice

  • Common words like “the” are usually one token.
  • Rare or long words like “antidisestablishmentarianism” get chopped into many tokens.
  • A space and a word stick together: " cat" is often one token.
  • Numbers, emoji, and weird characters can each be their own token.

Why does this matter?

Because AI models are paid (and limited) by tokens, not words.

  • A short kid’s poem might be 60 tokens.
  • A long school essay might be 1,000 tokens.
  • An LLM might be able to “hold in mind” 100,000 tokens at a time, like a really big notebook.

It also explains weird mistakes. If you ask an AI “How many letters R are in ‘strawberry’?” it sometimes gets it wrong, because it doesn’t really see letters. It sees token chunks. (Try it!)

Quick check

  1. 1. Which is closest to the truth about how an AI 'sees' text?
  2. 2. A long, rare word usually becomes…
  3. 3. Why does an AI sometimes miscount letters in a word?