TechBlog
system-design

Latency vs Throughput

Two of the most important performance metrics in system design — what they mean, how they differ, and how to optimize for each.

4 min read

The Two Core Performance Metrics

When measuring system performance, two numbers matter most: latency and throughput. They sound similar but measure completely different things — and optimizing for one can hurt the other.


Latency

Latency is the time it takes to complete a single operation — from request to response.

Common latency benchmarks

OperationTypical Latency
L1 cache read~1 ns
L2 cache read~4 ns
RAM read~100 ns
SSD read~100 µs
Network round trip (same DC)~500 µs
Network round trip (cross-region)~150 ms
HDD read~10 ms

These numbers are worth memorizing. A database call is ~1ms, a cross-continent network call is ~150ms.

Types of latency

Always look at p99, not just the average. If your average is 50ms but p99 is 5 seconds, 1 in 100 users has a terrible experience.


Throughput

Throughput is the number of operations a system can handle per unit of time.

Measured as:

What limits throughput?


The Relationship: Little's Law

Throughput = Concurrency / Latency

If your system handles 100 concurrent requests and each takes 100ms:

Throughput = 100 / 0.1s = 1000 RPS

To double throughput, you can either:


The Latency vs Throughput Trade-off

Kafka, database bulk inserts, and HTTP/2 multiplexing all use batching to maximize throughput at the cost of some latency.


How to Optimize Each

Reducing Latency

Increasing Throughput


Key Takeaway

LatencyThroughput
MeasuresSpeed of one requestVolume of requests per second
GoalAs low as possibleAs high as possible
Optimized byCaching, indexes, asyncScaling, batching, parallelism
Hurt byNetwork hops, blocking I/OResource contention, single-threading

Always define your latency and throughput requirements before designing a system. They drive every architectural decision.