What is a Neural Network?

The Biological Inspiration

A neural network is loosely inspired by the human brain. The brain has ~86 billion neurons connected by synapses. When a neuron receives enough signal, it fires and passes the signal forward.

Artificial neural networks mimic this with mathematical neurons connected in layers.

A Single Neuron

A neuron:

Takes multiple inputs
Multiplies each by a weight (how important is this input?)
Adds a bias (shifts the output)
Passes through an activation function (adds non-linearity)

def neuron(inputs, weights, bias):
    # Weighted sum
    z = sum(x * w for x, w in zip(inputs, weights)) + bias
    # Activation function (ReLU)
    return max(0, z)

Activation Functions

Activation functions add non-linearity — without them, a neural network is just linear regression no matter how many layers you add.

Function	Formula	Use case
ReLU	max(0, x)	Hidden layers (most common)
Sigmoid	1/(1+e⁻ˣ)	Binary classification output
Softmax	eˣⁱ/Σeˣ	Multi-class output
Tanh	(eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)	RNNs, some hidden layers

A Full Neural Network

Input layer — raw data (pixels, numbers, tokens)
Hidden layers — learn intermediate representations
Output layer — final prediction

The "deep" in deep learning refers to having many hidden layers.

How Training Works

Forward Pass

Data flows through the network, producing a prediction.

# Simplified forward pass
def forward(x, weights):
    h1 = relu(x @ weights[0])   # hidden layer 1
    h2 = relu(h1 @ weights[1])  # hidden layer 2
    output = softmax(h2 @ weights[2])  # output
    return output

Loss Function

Measures how wrong the prediction is.

# Cross-entropy loss for classification
loss = -sum(y_true * log(y_pred))
# If predicted "cat" with 90% confidence and it IS a cat → low loss
# If predicted "cat" with 90% confidence and it's a dog → high loss

Backpropagation

Calculate how much each weight contributed to the error, then adjust.

// Gradient descent weight update pseudo-code
weight = weight - learningRate * gradient;

The learning rate controls how big each update step is:

Too high → overshoots, unstable training
Too low → takes forever to converge

Types of Neural Networks

Type	Best for	Key idea
MLP (Fully Connected)	Tabular data	Every neuron connects to every next neuron
CNN	Images, video	Shared filters detect local patterns
RNN/LSTM	Sequences, time series	Hidden state carries memory
Transformer	Text, code, multimodal	Attention — relate any position to any other

What Neural Networks Actually Learn

Each layer learns increasingly abstract representations:

Image of a cat:
Layer 1: edges and corners
Layer 2: shapes (circles, curves)
Layer 3: parts (eyes, ears, whiskers)
Layer 4: "cat-ness"
Output: 97% cat, 3% dog

This hierarchical feature learning is why deep networks are so powerful — they build up complex understanding from simple patterns.

Key Takeaway

A neuron = weighted sum + bias + activation function
A neural network = many neurons in layers
Training = forward pass → loss → backpropagation → weight update → repeat
Deep = many hidden layers → learns hierarchical features
Different architectures (CNN, RNN, Transformer) are optimized for different data types