What causes a cold start?

Having no warm instance available, so the platform must create and initialise one before serving: allocating the container, loading the runtime or image, and running your startup code. It happens after scaling from zero, during sudden spikes that exceed warm capacity, and when platforms recycle instances.

How do you reduce cold starts?

Keep a minimum number of instances warm, shrink the deployment image, and move slow initialisation (connection pools, large config, model loading) off the request path or behind lazy loading. The trade-off is that warm instances cost money while idle, so size the minimum to your tail-latency needs.

Cold Start

A cold start is the extra latency the first request pays when a serverless instance or container has to be created and initialised from nothing before it can serve traffic.

also: cold start · serverless cold start · scale to zero

scale-to-zero tax+50ms to several secondspaid by the first request after idle

When there is no warm instance ready, the platform has to allocate one, pull the image or runtime, start the process, run your initialisation (load config, open connections, warm caches), and only then handle the request. On Cloud Run, Lambda, or similar, that can add anywhere from tens of milliseconds to several seconds, landing entirely on whoever hit the system first after a scale-up or an idle period.

Cold starts mostly hurt the tail and the edges: low-traffic services that keep scaling to zero, sudden spikes that outrun warm capacity, and anything with heavy init. The usual fixes are to keep a minimum number of instances warm, shrink the image and defer slow init work, and move connection setup out of the request path. The trade is cost, because warm instances bill while idle, so you are buying tail latency back with money.

free_toolThroughput & Concurrency CalculatorSize the concurrency and instance count behind a throughput and latency target.

related_terms

faq

Questions & answers

What causes a cold start?: Having no warm instance available, so the platform must create and initialise one before serving: allocating the container, loading the runtime or image, and running your startup code. It happens after scaling from zero, during sudden spikes that exceed warm capacity, and when platforms recycle instances.
How do you reduce cold starts?: Keep a minimum number of instances warm, shrink the deployment image, and move slow initialisation (connection pools, large config, model loading) off the request path or behind lazy loading. The trade-off is that warm instances cost money while idle, so size the minimum to your tail-latency needs.

Want this applied to your stack, not just defined?

The free tools run the numbers; an audit tells you where the real cost and risk are. Book a call, or leave your email and I'll reach out.

Book a call

Prefer proof first? See how this plays out in real case studies →