The three fundamental paradigms of machine learning — what they are, how they differ, and which real-world problems each one solves.
All of machine learning falls into three categories based on how the model learns:
Definition: The model learns from labeled examples — input/output pairs where the correct answer is provided.
Training data:
Email: "Win a free iPhone!" → Label: SPAM
Email: "Meeting at 3pm" → Label: NOT SPAM
Email: "Claim your prize" → Label: SPAM
Model learns: what patterns predict spam
Classification — predict a category
# Is this email spam? (yes/no)
# Is this tumor malignant? (yes/no)
# Which digit is this? (0-9)
model.predict(email) → "spam"Regression — predict a number
# What will this house sell for?
# What will the stock price be tomorrow?
model.predict(house_features) → 450000You need labeled data — someone has to manually label thousands of examples. This is expensive and time-consuming.
Definition: The model finds patterns in data without labels. No correct answers provided.
Training data:
Customer 1: [age=25, purchases=10, avg_spend=50]
Customer 2: [age=45, purchases=2, avg_spend=500]
Customer 3: [age=26, purchases=8, avg_spend=45]
...
Model discovers: there are 3 natural customer groups
Clustering — group similar items
from sklearn.cluster import KMeans
# Group customers by behavior
kmeans = KMeans(n_clusters=3)
kmeans.fit(customer_data)
# Discovers: budget shoppers, premium buyers, occasional buyersDimensionality Reduction — compress data while preserving structure
from sklearn.decomposition import PCA
# Reduce 100 features to 2 for visualization
pca = PCA(n_components=2)
reduced = pca.fit_transform(data)Definition: An agent learns by taking actions in an environment and receiving rewards or penalties.
Agent: AI player
Environment: Chess board
Action: Move a piece
Reward: +1 for winning, -1 for losing, 0 otherwise
Agent learns: which moves lead to winning
Too much exploitation → stuck in local optimum Too much exploration → never converges
| Supervised | Unsupervised | Reinforcement | |
|---|---|---|---|
| Data needed | Labeled pairs | Unlabeled data | Environment to interact with |
| Goal | Predict output | Find structure | Maximize reward |
| Feedback | Immediate (labels) | None | Delayed (reward) |
| Difficulty | Medium | Medium | Hard |
| Examples | Spam filter, diagnosis | Clustering, anomaly | Games, robotics, RLHF |
Modern LLMs use a fourth approach: self-supervised learning.
The model creates its own labels from unlabeled data:
Text: "The cat sat on the ___"
Task: Predict the missing word → "mat"
No human labels needed — the text itself provides supervision
This is how GPT, BERT, and all modern LLMs are trained. It scales to internet-scale data without expensive labeling.