Latency Budget
A latency budget is a total p95 response-time target split across the hops a request takes (network, app, database, cache, third parties) so each layer knows the time it is allowed to spend.
also: latency budget · response time budget
Set the number users should feel first, say 300ms at p95, then carve it up: 40ms network, 30ms TLS and edge, 60ms app logic, 120ms database, 50ms third-party call. Now every layer has a ceiling, and when the page is slow you compare actuals to the budget instead of guessing. The hop that blew its allocation is the one to fix.
Budgeting before building changes the design. If the database gets 120ms and a query plan needs 400ms, you learn that during design (add an index, cache it, denormalise) rather than after launch. Leave a slack line in the budget too, because the parts you do not control (DNS, a payment provider, a cold cache) will spend more than you hoped.
related_terms
faq
Questions & answers
- How do I set a latency budget?
- Start from the response time your users need, then divide it across the hops a request actually makes and leave a slack line for the parts you do not control. Measure each hop against its allocation so the slow layer is obvious instead of guessed.
- What is a good latency target?
- For user-facing API or page responses, a p95 in the low hundreds of milliseconds feels fast; under about 100ms feels instant. The right number depends on the interaction, but pick it from user perception, then budget the backend to fit inside it.
Want this applied to your stack, not just defined?
The free tools run the numbers; an audit tells you where the real cost and risk are. Book a call, or leave your email and I'll reach out.
Prefer proof first? See how this plays out in real case studies →