Reasoning models: AI that thinks before it answers
Why some 2026 models pause, scribble in a notebook, and then answer. And why that's a big deal.
A new kind of model
Until 2024, every LLM you used was a fast talker. Ask a question, get an answer right away, one token at a time. No real “thinking,” just very fast guessing.
In late 2024, something changed. A new class of models, called reasoning models, learned to pause and think before answering. The first famous one was OpenAI’s o1, then o3, then o4-mini. Anthropic launched extended thinking in Claude. Google added Deep Think in Gemini. DeepSeek released R1, an open model whose thinking you can actually read.
What “thinking” really is
Inside, a reasoning model does the same thing you do when a math problem is hard:
- Reads the question.
- Writes a private scratch pad: “Hmm, what do I know? Let me try this. No, that doesn’t work. What about…”
- Catches its own mistakes and corrects them.
- Only then writes the visible answer.
That scratch pad is sometimes hidden (o3), sometimes shown (DeepSeek R1’s <think> tags), sometimes a separate inspectable block (Claude). But the principle is the same: spend more compute thinking before committing to an answer.
It’s the difference between blurting out the first thing that comes to mind and pausing to actually work it through.
See the difference
Same problem, same model. The only difference is whether we let it think first.
A jar has 3 red, 2 blue, and 5 green marbles. You take out one without looking, then a second without putting the first back. What's the chance both are green?
1/4 (about 25%)
Confident, but wrong.
Same model, same question. The only difference: we let it think before it answered.
Why this matters
For tasks like:
- Multi-step math
- Logic puzzles
- Writing or debugging code
- Anything that requires planning
Reasoning models are much better than fast-talking models. OpenAI’s o3 set a new record on one of the hardest reasoning tests ever built. DeepSeek R1, an open-source reasoning model, can match much bigger closed models on the same kinds of problems.
For tasks like “draft me an email” or “summarize this paragraph,” a fast-talking model is still better. You don’t need the AI to deliberate for 30 seconds about how to write “Hi Alex, hope your weekend was good!”
The tradeoff
Thinking is not free.
- Time. A reasoning model can take 10 seconds to a minute to answer, instead of instantly.
- Money. You pay for the thinking time, even though you never see it.
- Sometimes worse. Over-thinking simple questions can actually make the answer worse.
That’s why modern AI products often let you choose: fast mode for chitchat, thinking mode for hard problems. Most reasoning models have a setting you can turn up or down, “think a little” vs “think a lot”, depending on how hard the question is.
The takeaway
A fast-talking LLM is a confident improviser. A reasoning model is a careful problem-solver.
Use the first for chatting, the second when getting it right matters.
This is one of the biggest shifts in AI since LLMs themselves arrived. It also points toward the next era: models that don’t just think before answering, but plan, act, and keep thinking across a whole agentic loop, which is exactly the world Module 4 introduced.
Quick check
- 1. What's special about a reasoning model?
- 2. Which task fits a reasoning model better than a fast-talking model?
- 3. What's a real downside of reasoning models?
And that’s the course
You now understand the basics, the inner workings, the limits, the action-taking layer, and the newest 2026 ideas: multimodal models, RAG, and reasoning. That’s more than enough to follow any AI news, judge any AI product, and start building.
Stay curious. Stay skeptical. Stay in the driver’s seat.