Latency vs Throughput

The Two Core Performance Metrics

When measuring system performance, two numbers matter most: latency and throughput. They sound similar but measure completely different things — and optimizing for one can hurt the other.

Latency

Latency is the time it takes to complete a single operation — from request to response.

Common latency benchmarks

Operation	Typical Latency
L1 cache read	~1 ns
L2 cache read	~4 ns
RAM read	~100 ns
SSD read	~100 µs
Network round trip (same DC)	~500 µs
Network round trip (cross-region)	~150 ms
HDD read	~10 ms

These numbers are worth memorizing. A database call is ~1ms, a cross-continent network call is ~150ms.

Types of latency

Always look at p99, not just the average. If your average is 50ms but p99 is 5 seconds, 1 in 100 users has a terrible experience.

Throughput

Throughput is the number of operations a system can handle per unit of time.

Measured as:

RPS — requests per second (APIs)
TPS — transactions per second (databases)
Mbps/Gbps — megabits per second (networks)

What limits throughput?

The Relationship: Little's Law

Throughput = Concurrency / Latency

If your system handles 100 concurrent requests and each takes 100ms:

Throughput = 100 / 0.1s = 1000 RPS

To double throughput, you can either:

Double concurrency (add more servers)
Halve latency (make each request faster)

The Latency vs Throughput Trade-off

Kafka, database bulk inserts, and HTTP/2 multiplexing all use batching to maximize throughput at the cost of some latency.

How to Optimize Each

Reducing Latency

Caching — serve from memory instead of disk/network
CDN — serve static assets from edge nodes close to users
Database indexes — avoid full table scans
Connection pooling — reuse DB connections instead of creating new ones
Async processing — don't make users wait for non-critical work

Increasing Throughput

Horizontal scaling — more servers = more parallel processing
Async I/O — don't block threads waiting for I/O
Batching — process multiple items in one operation
Compression — send less data over the network
Load balancing — distribute work evenly

Key Takeaway

	Latency	Throughput
Measures	Speed of one request	Volume of requests per second
Goal	As low as possible	As high as possible
Optimized by	Caching, indexes, async	Scaling, batching, parallelism
Hurt by	Network hops, blocking I/O	Resource contention, single-threading

Always define your latency and throughput requirements before designing a system. They drive every architectural decision.