Supervised vs Unsupervised vs Reinforcement Learning

The Three Paradigms

All of machine learning falls into three categories based on how the model learns:

Supervised Learning

Definition: The model learns from labeled examples — input/output pairs where the correct answer is provided.

Training data:
Email: "Win a free iPhone!" → Label: SPAM
Email: "Meeting at 3pm"    → Label: NOT SPAM
Email: "Claim your prize"  → Label: SPAM

Model learns: what patterns predict spam

Two types

Classification — predict a category

# Is this email spam? (yes/no)
# Is this tumor malignant? (yes/no)
# Which digit is this? (0-9)
model.predict(email) → "spam"

Regression — predict a number

# What will this house sell for?
# What will the stock price be tomorrow?
model.predict(house_features) → 450000

Real-world examples

Email spam detection
Medical diagnosis
Credit scoring
Image classification
Price prediction

The catch

You need labeled data — someone has to manually label thousands of examples. This is expensive and time-consuming.

Unsupervised Learning

Definition: The model finds patterns in data without labels. No correct answers provided.

Training data:
Customer 1: [age=25, purchases=10, avg_spend=50]
Customer 2: [age=45, purchases=2,  avg_spend=500]
Customer 3: [age=26, purchases=8,  avg_spend=45]
...

Model discovers: there are 3 natural customer groups

Two main types

Clustering — group similar items

from sklearn.cluster import KMeans
 
# Group customers by behavior
kmeans = KMeans(n_clusters=3)
kmeans.fit(customer_data)
# Discovers: budget shoppers, premium buyers, occasional buyers

Dimensionality Reduction — compress data while preserving structure

from sklearn.decomposition import PCA
 
# Reduce 100 features to 2 for visualization
pca = PCA(n_components=2)
reduced = pca.fit_transform(data)

Real-world examples

Customer segmentation
Anomaly detection (fraud, network intrusion)
Topic modeling in documents
Recommendation systems (find similar users)
Data compression

Reinforcement Learning

Definition: An agent learns by taking actions in an environment and receiving rewards or penalties.

Agent: AI player
Environment: Chess board
Action: Move a piece
Reward: +1 for winning, -1 for losing, 0 otherwise

Agent learns: which moves lead to winning

The exploration vs exploitation dilemma

Exploit — do what you know works (get safe rewards)
Explore — try new things that might work better (risk lower rewards)

Too much exploitation → stuck in local optimum Too much exploration → never converges

Real-world examples

Game playing (AlphaGo, OpenAI Five)
Robot locomotion
RLHF — training ChatGPT to be helpful (human feedback as reward)
Ad bidding optimization
Autonomous vehicles

Comparison

	Supervised	Unsupervised	Reinforcement
Data needed	Labeled pairs	Unlabeled data	Environment to interact with
Goal	Predict output	Find structure	Maximize reward
Feedback	Immediate (labels)	None	Delayed (reward)
Difficulty	Medium	Medium	Hard
Examples	Spam filter, diagnosis	Clustering, anomaly	Games, robotics, RLHF

Self-Supervised Learning — The Fourth Paradigm

Modern LLMs use a fourth approach: self-supervised learning.

The model creates its own labels from unlabeled data:

Text: "The cat sat on the ___"
Task: Predict the missing word → "mat"

No human labels needed — the text itself provides supervision

This is how GPT, BERT, and all modern LLMs are trained. It scales to internet-scale data without expensive labeling.

Key Takeaway

Supervised — labeled data, predict output (classification/regression)
Unsupervised — no labels, find hidden structure (clustering/compression)
Reinforcement — learn by trial and error with rewards
Self-supervised — create labels from data itself (how LLMs are trained)
Most production ML is supervised — it's the most reliable when you have labeled data