Neural networks are the engine behind modern AI. Here's how they actually work — from a single neuron to deep networks that power GPT.
A neural network is loosely inspired by the human brain. The brain has ~86 billion neurons connected by synapses. When a neuron receives enough signal, it fires and passes the signal forward.
Artificial neural networks mimic this with mathematical neurons connected in layers.
A neuron:
def neuron(inputs, weights, bias):
# Weighted sum
z = sum(x * w for x, w in zip(inputs, weights)) + bias
# Activation function (ReLU)
return max(0, z)Activation functions add non-linearity — without them, a neural network is just linear regression no matter how many layers you add.
| Function | Formula | Use case |
|---|---|---|
| ReLU | max(0, x) | Hidden layers (most common) |
| Sigmoid | 1/(1+e⁻ˣ) | Binary classification output |
| Softmax | eˣⁱ/Σeˣ | Multi-class output |
| Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | RNNs, some hidden layers |
The "deep" in deep learning refers to having many hidden layers.
Data flows through the network, producing a prediction.
# Simplified forward pass
def forward(x, weights):
h1 = relu(x @ weights[0]) # hidden layer 1
h2 = relu(h1 @ weights[1]) # hidden layer 2
output = softmax(h2 @ weights[2]) # output
return outputMeasures how wrong the prediction is.
# Cross-entropy loss for classification
loss = -sum(y_true * log(y_pred))
# If predicted "cat" with 90% confidence and it IS a cat → low loss
# If predicted "cat" with 90% confidence and it's a dog → high lossCalculate how much each weight contributed to the error, then adjust.
# Gradient descent weight update
weight = weight - learning_rate * gradientThe learning rate controls how big each update step is:
| Type | Best for | Key idea |
|---|---|---|
| MLP (Fully Connected) | Tabular data | Every neuron connects to every next neuron |
| CNN | Images, video | Shared filters detect local patterns |
| RNN/LSTM | Sequences, time series | Hidden state carries memory |
| Transformer | Text, code, multimodal | Attention — relate any position to any other |
Each layer learns increasingly abstract representations:
Image of a cat:
Layer 1: edges and corners
Layer 2: shapes (circles, curves)
Layer 3: parts (eyes, ears, whiskers)
Layer 4: "cat-ness"
Output: 97% cat, 3% dog
This hierarchical feature learning is why deep networks are so powerful — they build up complex understanding from simple patterns.