Module 2 · How LLMs work 6 min read

Embeddings: turning meaning into numbers

How an AI knows that "dog" and "puppy" are friends, but "dog" and "spaceship" are not.

The big question

How does an AI “know” that dog and puppy mean similar things, but dog and spaceship don’t?

It cheats, with numbers.

Each word gets coordinates

When the AI sees a word (or sentence), it turns it into a list of numbers. Like a secret code. A typical code has 768 numbers in it. You can imagine each word as a point floating in a giant 768-dimensional space.

Words with similar meanings end up close together in that space. Words with different meanings sit far apart.

These lists of numbers are called embeddings.

Why is this magical?

Because now “similar meaning” becomes “close in space”, and computers are really good at measuring distance.

Embeddings power:

Search: Google can find a page about “puppy training” even if you typed “how to teach my baby dog.”
Recommendations: Spotify knows two songs feel similar even if they share zero words.
Chatbot memory: when an AI “remembers” what you said, it often uses embeddings.

How “close” is measured

The trick we use to measure closeness is called cosine similarity. Don’t worry about the name. It just compares two of those long number lists and gives you back a single score:

+1 means almost identical meaning.
0 means totally unrelated.
−1 means opposite meaning.

Pick any two sources below. We turn each one into its embedding (a real list of numbers from a real model), and then compute the similarity score. Try comparing a roller coaster review with itself, then with a cookie recipe.

Try it yourself

Pick two sources. See how close their meanings really are.

Source A

Children's dinosaur book

Long ago, giant lizards called dinosaurs walked the Earth. Some were as tall as houses, and some had feathers like birds. The T-Rex was a fierce hunter, but the Brachiosaurus only ate leaves.

loading numbers…

Source B

Roller coaster review

The Velocicoaster launches you from 0 to 70 mph in just two seconds. Riders experience four inversions, including a top hat and a Norwegian loop. It is the most intense roller coaster in the park.

loading numbers…

Cosine similarity

…

−1 opposite0 unrelated+1 identical

computing embeddings…

Tip: compare the dinosaur book with itself, then with the cookie recipe. The same text gives 1.0000. Two unrelated topics drop way down.

See it in 2D

You can’t draw 768 dimensions. But you can squish them down to 2 with math (PCA), and watch how words cluster.

Try it. Add a few words. The picture should look like clouds, animals near animals, vehicles near vehicles, feelings near feelings.

Map of meaning

Add words. We turn each into 768 numbers, then squash to 2D. Similar meanings sit close.

dogcatcartruckapplebananahappysad

Things to try

“dog”, “cat”, “wolf” vs “car”, “truck”, “bus”, two clouds.
Add “puppy” near the dog cloud. Watch it land where you’d expect.
Add “happy”, “sad”, “angry”, emotions form their own cluster.

Quick check

1. An embedding is…
2. Why are embeddings useful?
3. Where might embeddings be used?