Concurrency
Concurrency is the number of requests a system is handling at the same instant, which sets how many workers, connections, and instances you need, distinct from throughput (requests finished per second).
also: concurrency · requests in flight · concurrency vs throughput
Throughput is a rate (requests per second). Concurrency is a count (requests in flight right now). They are linked by latency through Little's Law: hold throughput fixed and slower requests pile up, so concurrency rises even though the rate has not. This is why a latency regression can exhaust a connection pool without any change in traffic.
Most scaling limits are concurrency limits in disguise. A database with 100 connections, a serverless platform with a max-instances cap, a thread pool of 200: each is a ceiling on simultaneous work. When demand pushes concurrency past the ceiling, new requests queue, latency climbs, and the longer queue feeds back into more concurrency, which is how systems fall over suddenly rather than gracefully.
related_terms
faq
Questions & answers
- What is the difference between concurrency and throughput?
- Throughput is how many requests complete per unit time (a rate). Concurrency is how many are in progress at one moment (a count). Little's Law connects them: concurrency equals throughput times average latency.
- How do I size instances for a concurrency target?
- Find the concurrency you need (throughput times latency), divide by the concurrency one instance handles safely, then add headroom for spikes and for instances that are restarting or cold. Provision against peak throughput and tail latency, not the average.
Want this applied to your stack, not just defined?
The free tools run the numbers; an audit tells you where the real cost and risk are. Book a call, or leave your email and I'll reach out.
Prefer proof first? See how this plays out in real case studies →