What is the difference between p95 and average latency?

The average is pulled down by the many fast requests and hides the slow ones. p95 is the response time 95% of requests come in under, so it reports the experience of your slowest 5%, which is usually where complaints and abandonment come from.

Should I target p95 or p99?

It depends on stakes and fan-out. p95 is a reasonable default for user-facing latency. Move to p99 (or p99.9) when a single slow request is costly, or when one user action triggers many backend calls, because fan-out makes rare tail events common at the top level.

p95 Latency

p95 latency is the response time that 95% of requests come in under: a tail-latency measure that, unlike an average, reflects what your slowest and most-affected users actually experience.

also: p95 · p99 · tail latency · percentile latency · latency percentiles

avg 80ms, p99 2s1 in 100 users waits 2 secondsthe average never shows it

Averages hide the pain. A service can average 80ms while one request in twenty takes two seconds, and that one in twenty is disproportionately your power users, who make the most requests and so are most likely to hit the slow path. Percentiles surface that: p95 is the value 95% of requests beat, p99 the value 99% beat. The gap between p50 and p99 is the shape of your tail.

Tail latency compounds. If a page makes ten backend calls in parallel and waits for the slowest, the page's latency tracks the backends' p99, not their median, because with ten draws you almost always hit one slow one. Fan-out turns a rare backend slowdown into a common frontend one, which is why you budget and alert on p95 or p99 rather than the mean.

free_toolLatency Budget CalculatorCarve a p95 target across the hops a request takes and find the one that overspends.

related_terms

faq

Questions & answers

What is the difference between p95 and average latency?: The average is pulled down by the many fast requests and hides the slow ones. p95 is the response time 95% of requests come in under, so it reports the experience of your slowest 5%, which is usually where complaints and abandonment come from.
Should I target p95 or p99?: It depends on stakes and fan-out. p95 is a reasonable default for user-facing latency. Move to p99 (or p99.9) when a single slow request is costly, or when one user action triggers many backend calls, because fan-out makes rare tail events common at the top level.

Want this applied to your stack, not just defined?

The free tools run the numbers; an audit tells you where the real cost and risk are. Book a call, or leave your email and I'll reach out.

Book a call

Prefer proof first? See how this plays out in real case studies →