Natural Language Processing (NLP) Basics

💬 What Is NLP?

Natural Language Processing (NLP) is the branch of AI that gives computers the ability to work with human language — reading it, understanding its meaning, and even producing it. Sounds simple? Language is actually one of the hardest problems in AI.

🦆 Analogy — "I saw her duck" Did she duck (dodge)? Or did she own a duck (the bird)? Same words, two totally different meanings. Humans resolve this in a flash using context. Machines had to learn every one of these ambiguities from billions of examples.

Language is full of sarcasm, idioms, pronouns, spelling errors, cultural references, and ever-changing slang. Every sentence is a puzzle. NLP is the set of tools — from simple word counts to massive neural networks — that lets machines start solving that puzzle.

✂️ Tokenization: Chopping Text Into Pieces

Before a machine can "read" text, it needs the text as a list of standardised pieces it can process. We call each piece a token. Tokenization is the act of splitting text into tokens.

💡 Tip — Not just words Modern models often use subword tokens. The word "unbelievable" might split into un + believ + able. This helps handle rare words — even words the model has never seen can be built from familiar parts.

After tokenization, each token gets a unique integer ID. The sentence "The cat sat." might become [482, 1751, 992, 13]. The model works with those numbers, not letters.

Example: tokenizing the sentence below (simple whitespace + punctuation split)

Hover a chip to see a sample token ID (illustrative).

🔢 Turning Words Into Numbers: Embeddings

Neural networks only understand numbers. So we need a way to turn every word into a number — or better, a vector (a list of numbers). The simplest approach, Bag of Words, gives each word a slot in a giant array: 1 if the word appears in a document, 0 if not. It works, but it loses order and meaning.

A much richer idea: Word Embeddings. Train the model so that each word maps to a compact vector of, say, 300 numbers, learned such that similar words end up near each other in that vector space.

📍 Analogy — Words on a map Imagine plotting words on a map where cities represent meanings. "Paris" and "London" sit close together (both capitals). "Dog" and "Cat" cluster near each other (both pets). And the directions on the map are consistent: the vector from "king" to "queen" is almost the same as from "man" to "woman" — the direction encodes gender. That's why the famous equation works:

king − man + woman ≈ queen

king

[0.92, 0.18,
0.74, 0.05,
...]

queen

[0.89, 0.81,
0.71, 0.09,
...]

dog

[0.12, 0.09,
0.03, 0.88,
...]

cat

[0.14, 0.11,
0.02, 0.85,
...]

Notice: king/queen have similar first three values (royalty dimension). dog/cat are similar in the fourth (animal dimension). Embeddings capture these patterns automatically from data.

🛠️ What Can NLP Do?

NLP powers a huge range of applications you use every day:

😊

Sentiment Analysis

Is this review positive, negative, or neutral?

🌐

Translation

Convert text from one language to another.

📝

Summarisation

Squeeze a long document to its key points.

❓

Question Answering

Find the answer to a question inside a passage.

🔮

Autocomplete

Predict the next word (or sentence) a user will type.

🏷️

Named Entity Recognition

Spot people, places, organisations in text.

🚀 Coming up — Lesson 08: Transformers & LLMs Classic NLP methods (bag-of-words, simple word embeddings, RNNs) paved the road — but in 2017 the Transformer architecture arrived and changed everything. The next lesson explores how Transformers handle context, attention, and scale up to GPT-level language models.

Sentiment Detector 😊😡 Interactive

Type a sentence or short review below. The detector uses a built-in lexicon of positive and negative words, handles simple negation ("not good"), and shows which words influenced the score.

Next-Word Oracle 🔮 (a tiny language model) Interactive

This is a bigram Markov model — the simplest possible language model. It learned, from a short built-in corpus, which words tend to follow each other. Pick a starting word, choose a length, and watch it generate text by repeatedly sampling the most likely next word. This is the same core idea behind GPT — just at a microscopic scale.

Starting word

Length: 12 words

Your generated text will appear here…

How it works: The model scanned every pair of consecutive words in the corpus. When generating, it looks up the current word, picks a random successor weighted by how often each one appeared, then repeats. Bigger models do the same thing — just with much more data and context.