π¨Creating vs. Classifying
Most of the AI you met in earlier lessons was discriminative β it takes existing data and puts it in a box: "that's a cat", "this email is spam", "this tumour is benign." Generative AI flips the job around. Instead of answering what is this? it answers what could this be? β and then actually makes it.
Generative models can output text, images, audio, video, code, 3βD molecules, protein structures β anything that can be represented as numbers. The three dominant families today are LLMs, GANs, and Diffusion models.
π€Family 1 β LLMs: One Token at a Time
You met Large Language Models in Lessons 07β08. At generation time they work like an autocomplete engine on steroids: given everything typed so far, the model scores every word (token) in its vocabulary and then samples one according to those scores. That sampled word becomes part of the context, and the process repeats β token by token β until the model produces a full sentence, paragraph, or essay.
| Step | What happens |
|---|---|
| 1. Score | The model assigns a raw logit to every possible next token (e.g., 50 000 tokens). |
| 2. Softmax + Temperature | Logits are divided by a temperature T, then converted to probabilities via softmax. |
| 3. Sample | One token is drawn from the probability distribution (not always the highest!) |
| 4. Append & repeat | Chosen token appended to context; repeat until done. |
π΅οΈFamily 2 β GANs: Forger vs. Detective
Generative Adversarial Networks (GANs, 2014) pit two neural networks against each other in a continuous game:
Training alternates: the detective gets better at spotting fakes, so the forger has to get better at making them. The forger improves, so the detective must sharpen up again. Over thousands of rounds both networks improve together β like a counterfeiter and a cop locked in an arms race. When well-trained, the generator can produce images indistinguishable from photographs.
π«οΈFamily 3 β Diffusion Models: Sculpting from Static
Diffusion models power today's headline image generators (DALLΒ·E, Stable Diffusion, Midjourney). The idea sounds almost magical but is beautifully principled.
| Phase | What happens | Used at |
|---|---|---|
| Forward (training) | Real images gradually buried in Gaussian noise over T steps. | Training time only |
| Reverse (inference) | Network predicts & removes a little noise each step, guided by a text prompt. | Generation time |
| Guidance | Text or other conditioning steers which image emerges from the noise. | Generation time |
Unlike GANs, diffusion models train stably on a simple loss (predict the noise that was added), which is why they scale so well.
π‘οΈThe Temperature Knob β Creativity vs. Predictability
Whenever a generative model samples from a probability distribution, it can apply a temperature parameter T that reshapes the distribution before sampling:
softmax_T(logits)_i = exp(logit_i / T) / Ξ£ exp(logit_j / T)
Most LLM APIs default to T β 0.7β1.0 for a balance of coherence and variety. Coding assistants often use T = 0.2 for determinism; creative writing tools push T up to 1.5+.
Context: "The weather today is ..." β adjust temperature and watch how the probability of each next word changes. Then sample to see what the model would actually pick.
βοΈPrompts, Creativity & Responsibility
A prompt is the input you give a generative model β a text description, a seed image, a few-shot example. Because the output space is vast, the prompt acts as a steering wheel. The field of prompt engineering explores how to word prompts for reliable, high-quality outputs.
Generative AI also brings serious risks. A quick preview (Lesson 10 dives deeper):
- Deepfakes: Hyper-realistic synthetic images/video of real people, enabling fraud and disinformation.
- Copyright & IP: Models trained on copyrighted art can reproduce or closely mimic it.
- Hallucination: LLMs confidently generate plausible-sounding falsehoods.
- Misuse: Spam generation, phishing content, and propaganda at scale.
Each seed produces a unique, deterministic piece of abstract art. Same seed always yields the same image β just like a real generative model with a fixed random seed. Change the knobs to explore the creative space.