Tokens: the puzzle pieces of language
AIs don't see letters or whole words. They see "tokens." Let's break some.
A weird truth
When you read this sentence, you see letters grouped into words.
When an AI reads it, it doesn’t see letters or words. It sees tokens: chunks that are sometimes whole words, sometimes pieces of words, and sometimes just a space and a comma.
Try it yourself
Type anything in the box. Each colored piece is one token, with its secret ID number underneath.
Each colored piece is one token. The LLM sees your words as these numbered pieces, not as letters.
Things to notice
- Common words like “the” are usually one token.
- Rare or long words like “antidisestablishmentarianism” get chopped into many tokens.
- A space and a word stick together:
" cat"is often one token. - Numbers, emoji, and weird characters can each be their own token.
Why does this matter?
Because AI models are paid (and limited) by tokens, not words.
- A short kid’s poem might be 60 tokens.
- A long school essay might be 1,000 tokens.
- An LLM might be able to “hold in mind” 100,000 tokens at a time, like a really big notebook.
It also explains weird mistakes. If you ask an AI “How many letters R are in ‘strawberry’?” it sometimes gets it wrong, because it doesn’t really see letters. It sees token chunks. (Try it!)
Quick check
- 1. Which is closest to the truth about how an AI 'sees' text?
- 2. A long, rare word usually becomes…
- 3. Why does an AI sometimes miscount letters in a word?