by Yan Cui

I’m afraid you’re thinking about AWS Lambda cold starts all wrong

b9LMAc7mgvt8hZV-o4qVV5cDgzLOkwTQzsgZ — Photo by Matthew Henry on Unsplash

When I discuss AWS Lambda cold starts with folks in the context of API Gateway, I often get responses along the line of:

Meh, it’s only the first request right? So what if one request is slow, the next million requests would be fast.

Unfortunately, that is an oversimplification of what happens.

Cold start happens once for each concurrent execution of your function.

API Gateway reuses concurrent executions of your function if possible. Based on my observations, it might even queue up requests for a short time in the hope that one of the concurrent executions would finish and become reusable.

If user requests happen one after another, then you will only experience one cold start in the process. You can simulate this using Charles proxy by repeating a captured request with a concurrency setting of 1.

As you can see in the timeline below, only the first request experienced a cold start. The response for this request was much slower than the rest.

1 out of 100 — that’s bearable. Hell, it won’t even show up in my 99 percentile latency metric.

What if the user requests came in droves instead? After all, user behaviours are unpredictable and unlikely to follow the nice sequential pattern we see above. So let’s simulate what happens when we receive 100 requests with a concurrency of 10.

Now things don’t look quite as rosy — the first 10 requests were all cold starts! This is problematic if your traffic pattern is bursty around specific times of the day or specific events, for example:

Food ordering services (like JustEat and Deliveroo) have bursts of traffic around meal times
e-commence sites have highly concentrated bursts of traffic around popular shopping days of the year — like Cyber Monday and Black Friday
Betting services have bursts of traffic around sporting events
Social networks have bursts of traffic around notable events happening around the world

For these services, the sudden bursts of traffic means API Gateway would add more concurrent executions of your Lambda function. That equates to bursts of cold starts, and that’s bad news for you.

These are also the most crucial periods for your business when you want your service to be on its best behavior.

If the spikes are predictable, then you can mitigate the effect of cold starts by pre-warming your API.

For example, in the case of a food ordering service, you know there will be a burst of traffic at noon. You can schedule a cron job using a CloudWatch scheduled event at 11:58am to trigger a Lambda function. This function would generate a burst of concurrent requests to force API Gateway to spawn the desired number of concurrent executions ahead of time.

You can use HTTP headers to tag these requests. The handling function can then distinguish them from normal user requests and short-circuit.

Does it not betray the ethos of serverless computing that you shouldn’t have to worry about scaling?

Yes, it does, but making users happy trumps everything else. Your users are not happy to wait for your function to cold start so they can order their food. The cost of switching to a competitor is so low nowadays, what’s stopping them from leaving you?

You could also consider reducing the impact of cold starts instead, by reducing the duration of cold starts:

Author your Lambda functions in a language that doesn’t incur a high cold start time — that is, Node.js, Python, or Go
Use higher memory setting for functions on the critical path, including intermediate APIs
Optimize your function’s dependencies and package size
Stay as far away from VPCs as you can! Lambda creates ENIs (elastic network interface) to the target VPC, which can add up to 10s (yeah, you’re reading it right) to your cold start

There are also two other factors to consider:

Executions that are idle for a while would be garbage collected
Executions that have been active for a while (somewhere between 4 and 7 hours) would be garbage collected, too

What about APIs that are seldom used? In that case, every invocation might be a cold start if too much time passes between invocations. To your users, these APIs are always slow, so they’re used less, and it becomes a vicious cycle.

For these, you can use a cron job (as in, a CloudWatch scheduled event with a Lambda function as target) to keep them warm. The cron job would run every 5–10 mins and ping the API with a special request. By keeping these APIs warm, your users would not have to endure cold starts.

This approach is less effective for busy functions with lots of concurrent executions. The ping message would only reach one of the concurrent executions, and there is no way to direct it to specific executions. In fact, there is no reliable way to know the exact number of concurrent executions for a function at all.

Also, if the number of concurrent user requests drops, then it’s in your best interest to let idle executions be garbage collected. After all, you wouldn’t want to pay for unnecessary resources you don’t need.

This post is not intended to be your one-stop guide to AWS Lambda cold starts. It’s intended to illustrate that talking about cold starts is a more nuanced discussion than “the first request.”

Cold starts are a characteristic of the platform that we have to live with. And we love the AWS Lambda platform and want to use it, as it delivers on so many fronts. Nonetheless, it’s important to not let our own preference blind us from what’s important. Keeping our users happy and building a product that they love is always the most important goal.

To that end, you do need to know the platform you’re building on. With the cost of experimentation being so low, there’s no reason not to experiment with AWS Lambda yourself. Try to learn more about how it behaves and how you can make the most of it.