p95 Latency
p95 latency is the response time that 95% of requests come in under: a tail-latency measure that, unlike an average, reflects what your slowest and most-affected users actually experience.
also: p95 · p99 · tail latency · percentile latency · latency percentiles
Averages hide the pain. A service can average 80ms while one request in twenty takes two seconds, and that one in twenty is disproportionately your power users, who make the most requests and so are most likely to hit the slow path. Percentiles surface that: p95 is the value 95% of requests beat, p99 the value 99% beat. The gap between p50 and p99 is the shape of your tail.
Tail latency compounds. If a page makes ten backend calls in parallel and waits for the slowest, the page's latency tracks the backends' p99, not their median, because with ten draws you almost always hit one slow one. Fan-out turns a rare backend slowdown into a common frontend one, which is why you budget and alert on p95 or p99 rather than the mean.
faq
Questions & answers
- What is the difference between p95 and average latency?
- The average is pulled down by the many fast requests and hides the slow ones. p95 is the response time 95% of requests come in under, so it reports the experience of your slowest 5%, which is usually where complaints and abandonment come from.
- Should I target p95 or p99?
- It depends on stakes and fan-out. p95 is a reasonable default for user-facing latency. Move to p99 (or p99.9) when a single slow request is costly, or when one user action triggers many backend calls, because fan-out makes rare tail events common at the top level.
Want this applied to your stack, not just defined?
The free tools run the numbers; an audit tells you where the real cost and risk are. Book a call, or leave your email and I'll reach out.
Prefer proof first? See how this plays out in real case studies →