free_tool

Throughput & Concurrency Calculator

How many instances does your traffic actually need? Enter requests per second and latency and Little's Law gives you the in-flight concurrency to support — and the fleet size to serve it with headroom, instead of guessing and over-provisioning.

Throughput (requests / second)

rps

Average latency

ms

Concurrency per instance

Target utilization

%

Instances needed

12@ 70% util

actual utilization ≈ 66.7% once rounded up

Required concurrencyL = λ·W: 400
Throughput / instance: 625 rps
Capacity at this fleet: 7,500 rps

Little's Law — in-flight requests = arrival rate × latency. Halve latency and you halve the instances; the fleet rides the slowest hop.

Sizing a fleet or chasing a scaling cliff? I'll pressure-test the numbers against your real traffic shape and autoscaling config.

Size it with me — book a call

A first-order model (L = λ·W). Real fleets also see queueing, GC pauses and connection limits — but this is the number every capacity plan starts from. Share the link to compare scenarios.

how_it_works

One identity behind every capacity plan

Little's Law says the average number of requests in flight equals arrival rate × time in system: L = λ·W. At 5,000 req/s and 80ms latency you always have ~400 requests in flight — no matter how you slice it.

From there it's division: a single instance holding 50 concurrent requests clears 50 ÷ 0.08s ≈ 625 req/s, so you need enough instances to cover 5,000 req/s at your target utilization. The lever that moves it most isn't more boxes — it's cutting W. Halve latency and you halve the fleet.