Generative AI: GANs, Diffusion & Temperature

🎨Creating vs. Classifying

Most of the AI you met in earlier lessons was discriminative — it takes existing data and puts it in a box: "that's a cat", "this email is spam", "this tumour is benign." Generative AI flips the job around. Instead of answering what is this? it answers what could this be? — and then actually makes it.

Analogy — the critic vs. the artist A film critic watches a thousand movies and learns to say "this is good" or "this is trash." A film director also watches a thousand movies — but then picks up a camera and makes one. Discriminative AI is the critic. Generative AI is the director.

Generative models can output text, images, audio, video, code, 3‑D molecules, protein structures — anything that can be represented as numbers. The three dominant families today are LLMs, GANs, and Diffusion models.

🔤Family 1 — LLMs: One Token at a Time

You met Large Language Models in Lessons 07–08. At generation time they work like an autocomplete engine on steroids: given everything typed so far, the model scores every word (token) in its vocabulary and then samples one according to those scores. That sampled word becomes part of the context, and the process repeats — token by token — until the model produces a full sentence, paragraph, or essay.

Key insight An LLM never "decides" to write a paragraph. It decides one tiny word at a time, and the paragraph emerges from thousands of micro-decisions chained together.

Step	What happens
1. Score	The model assigns a raw logit to every possible next token (e.g., 50 000 tokens).
2. Softmax + Temperature	Logits are divided by a temperature T, then converted to probabilities via softmax.
3. Sample	One token is drawn from the probability distribution (not always the highest!)
4. Append & repeat	Chosen token appended to context; repeat until done.

🕵️Family 2 — GANs: Forger vs. Detective

Generative Adversarial Networks (GANs, 2014) pit two neural networks against each other in a continuous game:

The Generator (Forger) Takes pure random noise and tries to produce something that looks real — a fake painting, a fake face, a fake voice clip. Its only goal: fool the detective.

The Discriminator (Detective) Sees a mix of real samples and the forger's fakes. It must label each as real or fake. Its only goal: catch the forger.

Training alternates: the detective gets better at spotting fakes, so the forger has to get better at making them. The forger improves, so the detective must sharpen up again. Over thousands of rounds both networks improve together — like a counterfeiter and a cop locked in an arms race. When well-trained, the generator can produce images indistinguishable from photographs.

Why GANs can be unstable If one player gets too good too fast, the game collapses. The forger might produce only a single type of output that always fools the detective ("mode collapse"), or training can oscillate endlessly. GANs require careful tuning — one reason diffusion models have largely taken over for image generation.

🌫️Family 3 — Diffusion Models: Sculpting from Static

Diffusion models power today's headline image generators (DALL·E, Stable Diffusion, Midjourney). The idea sounds almost magical but is beautifully principled.

The TV-static analogy Imagine a clear photograph that someone slowly buries under TV static — a little more noise added each step until the image is completely unrecognisable. A diffusion model is trained to reverse that process: given slightly noisy image, predict what it looked like with slightly less noise. Apply that denoising step 20–1000 times, starting from pure static, and a crisp image emerges — like a sculptor brushing dust off a hidden statue.

Phase	What happens	Used at
Forward (training)	Real images gradually buried in Gaussian noise over T steps.	Training time only
Reverse (inference)	Network predicts & removes a little noise each step, guided by a text prompt.	Generation time
Guidance	Text or other conditioning steers which image emerges from the noise.	Generation time

Unlike GANs, diffusion models train stably on a simple loss (predict the noise that was added), which is why they scale so well.

🌡️The Temperature Knob — Creativity vs. Predictability

Whenever a generative model samples from a probability distribution, it can apply a temperature parameter T that reshapes the distribution before sampling:

softmax_T(logits)_i  =  exp(logit_i / T)  /  Σ exp(logit_j / T)

Low Temperature (T → 0) The highest-scoring option gets almost all the probability. Output is safe, predictable, repetitive. Like a writer who always picks the most expected word.

High Temperature (T → ∞) Probabilities flatten — even unlikely words get a real chance. Output is creative, surprising, sometimes nonsensical. Like a poet who never uses the obvious word.

Most LLM APIs default to T ≈ 0.7–1.0 for a balance of coherence and variety. Coding assistants often use T = 0.2 for determinism; creative writing tools push T up to 1.5+.

Temperature Dial 🌡️ Interactive

Context: "The weather today is ..." — adjust temperature and watch how the probability of each next word changes. Then sample to see what the model would actually pick.

Temperature 0.80

Sampled words will appear here...

✍️Prompts, Creativity & Responsibility

A prompt is the input you give a generative model — a text description, a seed image, a few-shot example. Because the output space is vast, the prompt acts as a steering wheel. The field of prompt engineering explores how to word prompts for reliable, high-quality outputs.

Generative AI also brings serious risks. A quick preview (Lesson 10 dives deeper):

Risks to know

Deepfakes: Hyper-realistic synthetic images/video of real people, enabling fraud and disinformation.
Copyright & IP: Models trained on copyrighted art can reproduce or closely mimic it.
Hallucination: LLMs confidently generate plausible-sounding falsehoods.
Misuse: Spam generation, phishing content, and propaganda at scale.

Generative Art Machine 🎨 Interactive

Each seed produces a unique, deterministic piece of abstract art. Same seed always yields the same image — just like a real generative model with a fixed random seed. Change the knobs to explore the creative space.

Seed 42

Shapes 40

Palette

Symmetry