How do you calculate an error budget?

Take one minus your SLO target, then apply it to the window. For requests, multiply the unavailable fraction by your monthly request count. For time, multiply it by the minutes in the window. At 99.9% that is 0.1% of requests, or about 43 minutes a month.

What happens when the error budget runs out?

The standard policy is a change freeze on anything risky until the window resets or you claw budget back by fixing the underlying problem. Bug fixes and reliability work continue; new feature rollouts pause. The freeze is automatic so nobody has to win an argument mid-incident.

Error Budget

An error budget is the amount of failure an SLO permits: the share of requests or minutes you are allowed to lose before you have to stop shipping and fix reliability.

also: error budget

99.9% over 5M req / mo5,000 failed requestsspend it on deploys, not surprises

If your SLO is 99.9%, your error budget is the other 0.1%. Over five million requests a month that is five thousand allowed failures; over a 30-day window it is about 43 minutes of downtime. The budget is the same fact as the SLO, expressed as room to spend rather than a bar to clear.

Framing it as a budget makes reliability a resource you trade against velocity. While the budget has room, you ship. When it is spent, the policy is to freeze risky changes and pay down reliability until the next window resets. That takes the argument out of each individual incident, because the number decides.

Error budgets only work if both sides honour them. Product gets to move fast while there is budget, and engineering gets an automatic, pre-agreed brake when there is not.

free_toolSLO & Error Budget CalculatorTurn an availability target into the downtime, failed requests, and burn rate it allows.

related_terms

faq

Questions & answers

How do you calculate an error budget?: Take one minus your SLO target, then apply it to the window. For requests, multiply the unavailable fraction by your monthly request count. For time, multiply it by the minutes in the window. At 99.9% that is 0.1% of requests, or about 43 minutes a month.
What happens when the error budget runs out?: The standard policy is a change freeze on anything risky until the window resets or you claw budget back by fixing the underlying problem. Bug fixes and reliability work continue; new feature rollouts pause. The freeze is automatic so nobody has to win an argument mid-incident.

Want this applied to your stack, not just defined?

The free tools run the numbers; an audit tells you where the real cost and risk are. Book a call, or leave your email and I'll reach out.

Book a call

Prefer proof first? See how this plays out in real case studies →