What Does "Generative" Actually Mean?
Most AI you've encountered is discriminative β it looks at something that already exists and puts a label on it. A spam filter reads your email and decides "spam or not spam." An image classifier looks at a photo and says "cat" or "dog." These systems are incredibly useful, but they never produce anything new.
Generative AI does the opposite. Instead of labelling existing content, it creates brand-new content from scratch β text, images, audio, video, even code. ChatGPT writes an essay. Midjourney paints a fantasy landscape. GitHub Copilot suggests a function. Suno composes a pop song. None of that content existed before you asked.
Think of the difference this way: a discriminative model is a wine critic who can tell you whether a wine is good or bad. A generative model is the winemaker who produces a new vintage β one that has never existed before.
The Main Families of Generative AI
Under the hood, there are several distinct approaches to generating content. The three you'll hear about most are Large Language Models (LLMs), GANs, and Diffusion Models.
Large Language Models β Next-Token Prediction at Scale
LLMs like the one behind ChatGPT generate text by doing one surprisingly simple thing, over and over: predicting the most likely next word (technically "token") given everything that came before it. Feed it "The cat sat on theβ¦" and it assigns probabilities to thousands of possible next words β "mat," "floor," "roof" β then samples one, then predicts the word after that, and so on.
The "large" part matters. These models are trained on enormous slices of the internet and books, which gives them a rich statistical sense of how language, facts, and reasoning flow. To go deeper on how transformers power this process, see our Lesson 8: Transformers & LLMs explainer. And if you're curious specifically about ChatGPT, check out How Does ChatGPT Work?
GANs β The Forger and the Detective
Generative Adversarial Networks, or GANs, were the breakthrough behind eerily realistic "this person does not exist" portraits. They work through a competitive game between two neural networks:
- The Generator (the Forger) β tries to create fake images convincing enough to fool the other network.
- The Discriminator (the Detective) β tries to tell real images apart from the Forger's fakes.
GANs were dominant in the early days of AI-generated faces and artistic style transfer. Their weakness is training instability β the two networks can fall out of balance, leading to repetitive or nonsensical output.
Diffusion Models β Sculpting from Static
Diffusion models power today's most popular image generators β Stable Diffusion, DALL-E, and Midjourney all rely on this approach. The core idea is delightfully counterintuitive: start with pure noise and gradually clean it up.
During training, the model learns to reverse a noising process. Think of it as TV static: each step, the model learns to remove a tiny bit of the "snow" until a clear picture emerges. When you type a prompt, the model starts from random noise and takes hundreds of small denoising steps, each guided by your words, until a coherent image appears.
Compared to GANs, diffusion models are more stable to train and produce strikingly diverse, high-quality outputs. The trade-off is speed: all those denoising steps take more computation than a single GAN forward pass.
| Model Family | How It Generates | Typical Use | Key Strength |
|---|---|---|---|
| LLMs | Predicts next token | Text, code, chat | Fluent language & reasoning |
| GANs | Forger vs. Detective | Faces, style transfer | Fast, sharp output |
| Diffusion | Denoises from static | Images, video, audio | Diversity & quality |
The Temperature Knob β How Randomness Shapes Output
Every generative AI system has some concept of randomness baked in. For LLMs, it's made explicit through a setting called temperature. When the model has assigned probabilities to possible next tokens, temperature controls whether it plays it safe or takes a risk.
- Low temperature (e.g., 0.1) β The model almost always picks the most probable token. Output is predictable, consistent, and can feel repetitive. Good for factual Q&A or code generation where correctness matters.
- High temperature (e.g., 1.5) β The model samples more freely from the probability distribution, sometimes picking surprising or rare tokens. Output is creative and varied β but occasionally drifts into nonsense.
Image generators have an equivalent idea: the amount of "guidance" applied at each denoising step. More guidance = closer to your prompt but potentially less creative. Less guidance = dreamier, less literal results.
Real-World Uses
Generative AI has moved from research labs into daily tools faster than almost any technology in history. Current applications include:
- Writing & summarisation β drafting emails, blog posts, legal summaries, and customer-support replies.
- Code generation β autocompleting functions, explaining bugs, converting code between languages.
- Image & video creation β product mockups, concept art, film pre-visualisation, synthetic training data.
- Audio & music β voice cloning for accessibility tools, custom background music, podcast production.
- Drug discovery β generating candidate molecular structures that might bind to a target protein.
A Balanced Note on Risks
Generative AI is powerful enough that its misuse is a genuine concern, not a sci-fi worry:
- Deepfakes β hyper-realistic but fabricated video and audio of real people can spread misinformation or damage reputations.
- Misinformation at scale β LLMs can produce plausible-sounding false claims quickly and cheaply, flooding information spaces.
- Copyright and ownership β when a model is trained on millions of artworks, who owns the output? Courts and legislators are still working this out.
- Bias amplification β models trained on human-generated data inherit and can amplify societal biases.
Being informed about how these systems work β which is exactly what you're doing β is the first step toward using them responsibly.
Frequently Asked Questions
Is ChatGPT generative AI?
Yes. ChatGPT is built on GPT-4 (and later models), which are Large Language Models β a type of generative AI. It generates new text by predicting one token at a time, guided by your prompt and the conversation history. It doesn't retrieve pre-written answers; it constructs each response on the fly.
How do AI image generators work?
Most modern AI image generators β including Stable Diffusion, DALL-E, and Midjourney β use diffusion models. You provide a text prompt; the model starts from pure random noise and runs it through hundreds of denoising steps, each step steered by the meaning of your words, until a coherent image emerges. The text is understood by a separate language model (like CLIP) that acts as a bridge between words and pixels.
What is temperature in AI?
Temperature is a number (often between 0 and 2) that controls how much randomness an LLM uses when choosing its next word. At temperature 0 the model is deterministic β it always picks the most likely word. At higher temperatures it samples more broadly, producing more surprising and creative β but sometimes less accurate β output. Many AI tools expose this as a "creativity" slider.
Play with the temperature dial in Lesson 9 β