Two of the most important performance metrics in system design — what they mean, how they differ, and how to optimize for each.
When measuring system performance, two numbers matter most: latency and throughput. They sound similar but measure completely different things — and optimizing for one can hurt the other.
Latency is the time it takes to complete a single operation — from request to response.
| Operation | Typical Latency |
|---|---|
| L1 cache read | ~1 ns |
| L2 cache read | ~4 ns |
| RAM read | ~100 ns |
| SSD read | ~100 µs |
| Network round trip (same DC) | ~500 µs |
| Network round trip (cross-region) | ~150 ms |
| HDD read | ~10 ms |
These numbers are worth memorizing. A database call is ~1ms, a cross-continent network call is ~150ms.
Always look at p99, not just the average. If your average is 50ms but p99 is 5 seconds, 1 in 100 users has a terrible experience.
Throughput is the number of operations a system can handle per unit of time.
Measured as:
Throughput = Concurrency / Latency
If your system handles 100 concurrent requests and each takes 100ms:
Throughput = 100 / 0.1s = 1000 RPS
To double throughput, you can either:
Kafka, database bulk inserts, and HTTP/2 multiplexing all use batching to maximize throughput at the cost of some latency.
| Latency | Throughput | |
|---|---|---|
| Measures | Speed of one request | Volume of requests per second |
| Goal | As low as possible | As high as possible |
| Optimized by | Caching, indexes, async | Scaling, batching, parallelism |
| Hurt by | Network hops, blocking I/O | Resource contention, single-threading |
Always define your latency and throughput requirements before designing a system. They drive every architectural decision.