The One-Sentence Answer

ChatGPT predicts the next token โ€” a small chunk of text โ€” over and over again, billions of parameters deep, trained on an enormous slice of human writing. That one loop, repeated hundreds of times per reply, produces everything from sonnets to working Python code.

The rest of this article unpacks what that actually means, so you walk away with a genuine mental model โ€” not just buzzwords.

Step 1: Text Gets Chopped Into Tokens

Before ChatGPT processes a single word, your message is split into tokens โ€” the raw units the model works with. A token is roughly 3โ€“4 characters, so common short words ("the", "is") are usually one token, while longer or rarer words get split into pieces. The word "unbelievable" might become un + bel + iev + able โ€” four tokens.

Each token maps to a number (an ID), because neural networks speak math, not English. You can play with a live tokenizer in Lesson 8: Transformers & LLMs to see exactly how your own sentences get sliced up.

Analogy Think of tokens like Scrabble tiles. You hand the model a bag of tiles representing your sentence, and it has to figure out what word comes next โ€” based on every tile it can see.
Text exampleApproximate tokens
"ChatGPT"3 tokens: Chat ยท G ยท PT
"Hello, world!"4 tokens
A 750-word essay~1,000 tokens
One page of a novel~600โ€“700 tokens

Step 2: The Transformer โ€” How Tokens "Talk" to Each Other

Once your text is tokenized, it passes through a Transformer โ€” the architecture that powers ChatGPT, Gemini, Claude, and most modern AI language systems. The key idea inside a Transformer is called self-attention, and it's what makes these models surprisingly good at understanding meaning.

Self-attention lets every token in a sentence look at every other token and ask: "How relevant are you to understanding what I mean?" The model assigns weights โ€” higher weight to tokens that matter, lower to ones that don't โ€” and uses those weights to build a richer representation of meaning.

Analogy Consider the sentence: "The animal didn't cross the street because it was too tired." What does "it" refer to โ€” the animal or the street? You instantly know it's the animal, because "tired" fits a living thing. Self-attention does the same job computationally: the token "it" attends strongly to "animal" and weakly to "street", resolving the ambiguity. You can explore interactive attention maps in Lesson 8.

This is a massive leap over older methods. Before Transformers, models processed text left-to-right, one word at a time, often losing track of context from earlier in a long sentence. Self-attention processes all tokens in parallel and can connect words hundreds of positions apart. If you want the linguistic background, Lesson 7 on Language & NLP covers how earlier approaches worked.

Step 3: What "Large" Actually Means

The "L" in LLM stands for Large โ€” and it's not false advertising. ChatGPT's underlying models have hundreds of billions of parameters: numerical values (think adjustable dials) spread across many layers of the Transformer. During training, every dial is tuned to minimise one thing โ€” the error on predicting the next token, across an enormous corpus of text from the internet, books, code, and more.

Training happens in two main stages:

Key takeaway Pre-training gives ChatGPT its knowledge of the world. Fine-tuning and RLHF give it its personality and safety guardrails. Both matter โ€” and neither is simple.

The Context Window: How Much Can It "See"?

Every conversation you have with ChatGPT is fed in as a block of tokens. The context window is the maximum number of tokens the model can look at in one go โ€” its working memory, so to speak. Older versions had windows of around 4,000 tokens (roughly 3,000 words). Newer models support 128,000 tokens or more.

When a conversation exceeds the context window, the model can no longer "see" the earliest messages. It hasn't forgotten in the human sense โ€” those tokens were simply never in view. This is why very long chats can feel like the model loses track of what was said at the start.

Why ChatGPT Makes Things Up (Hallucination)

This is the most important limitation to understand. ChatGPT is a probability machine, not a search engine or fact database. At every step it predicts the most plausible-sounding next token โ€” and "plausible-sounding" is not the same as "true".

When the model doesn't have reliable information about something, it doesn't know to stop and say "I don't know." Instead it continues generating tokens that fit the pattern of a confident, fluent answer โ€” producing text that sounds authoritative but can be completely wrong. This is called hallucination.

Key takeaway Treat ChatGPT outputs the way you'd treat a confident friend who reads a lot but doesn't cite sources. Useful, often accurate, but always worth verifying for anything consequential.

What ChatGPT Is Not

A few common misconceptions are worth clearing up directly:

For a broader look at what AI can generate, see Lesson 9: Generative AI and our article What Is Generative AI?

Frequently Asked Questions

Does ChatGPT understand what it says?

Not in the way humans do. ChatGPT has no internal model of the world, no intentions, and no awareness of meaning. It has learned extraordinarily rich statistical relationships between tokens โ€” which can produce outputs that look like understanding. Whether that counts as a form of understanding is a genuine philosophical debate, but practically speaking: don't assume it grasps context the way a person would.

Why does ChatGPT make things up?

Because it's predicting plausible text, not retrieving verified facts. When the training data doesn't strongly constrain the answer, the model fills the gap with whatever token sequence fits the pattern of a fluent, confident response. There's no built-in "I'm not sure" alarm โ€” though fine-tuning has improved this. Always verify important factual claims independently.

What is a token?

A token is the smallest unit of text the model works with โ€” roughly 3โ€“4 characters or about three-quarters of a word on average. Your entire conversation is converted into a sequence of token IDs (numbers) before the model ever runs. Tokens are not words: punctuation, spaces, and word fragments all count as tokens too.

Is ChatGPT the same as AGI?

No. AGI (Artificial General Intelligence) refers to a system that can learn and perform any intellectual task a human can, with genuine flexibility and understanding. ChatGPT is a large language model โ€” exceptional at text tasks it was trained for, but lacking the autonomous reasoning, goal-setting, and broad adaptability that AGI would require. Researchers disagree about how close (or far) we are from AGI, but today's LLMs, impressive as they are, are not it.

Key takeaway Ready to go deeper? See tokens split in real time, watch attention weights shift across a sentence, and build your own intuition for how Transformers work. See tokens & attention live in Lesson 8 โ†’