Cold Starts Suck! Here’s How To Deal With Them.

Taavi Rehemägi

August 15th, 2018

Before I jump in I’d like to take a step back to provide a better frame of reference for our fellow developers that are only now jumping into serverless and might find some of the topics covered here a bit confusing.

Every time I talk about serverless computing I constantly get interrupted by people making a note that serverless is in fact not server-less and that you have servers somewhere. And of course, you have an actual machine that runs your code. Serverless is not magic. There aren’t any magic server elves that take your code and run it on their magical cloud that connects to your modem. I admit that the wording can be confusing but if we all can just move past that I’m sure we can build wonderful things using serverless.

Back to the point, cloud computing is basically a network of machines that talk to each other and what makes them different from a local computer or a server is that they are accessed 100% remotely and can scale gracefully. Serverless computing is similar to an extent to this arguably juvenile explanation of cloud computing. You upload your code to a “Function As A Service” provider like AWS Lambda and it then gets made available based on requests(as it’s event-driven). The code I mentioned earlier is but a function (hence the term function as a service) that once it runs, it returns a value and dies. Each function is basically a self-sufficient, completely stateless, event-driven transient container that after a period of time will be deleted altogether.

How do cold starts happen

Cold start refers to the state our function when serving a particular invocation request. A serverless function is served by one or multiple micro-containers. If there isn’t a container readily available, the function will spin up a new one and cause a delay — this is what we call a “cold start”.

That’s how cold starts happen, after a period of inactivity the container gets shut down. It then needs to get spun up again and can take up to 5 seconds for AWS Lambda(some providers take way longer than that), and as you might imagine having a 5s request delay in production is going to be a hard pill to swallow. It’s one of the biggest drawbacks that keep people from switching to serverless from traditional servers, alongside the dreaded vendor lock-in.

Do we even need cold starts?

But cold starts are a necessary evil. One of the main benefits of serverless computing is their basically infinite scaling system, and the way this happens is that your functions run for whatever period of time they need to and then after, what now seems to amount to 45 minutes of inactivity for AWS Lambda, they get destroyed. That’s why they have room to scale your application and can only charge for the invocation and execute time.

For new containers to spin up, inactive ones need to die. It’s how serverless works. I’ve mentioned that the amount of time before the container gets destroyed is 45 minutes but that might vary based on the availability and demand in your current region.

How do you identify cold starts

One of the best ways of finding cold starts within your app is by using a serverless observability tool like Dashbird. You signup for the free tier and get up to 1gb of AWS logs. After you are done with the signup process you need to login to the app and go to your lambdas where you can see the status of the last invocation plus you can filter for cold starts specifically.

How do I avoid cold starts?

Unless it’s a critical function of the app I usually take the 5-second hit, especially if I have it happen once a day. But more often than not I find myself needing to create a “wake-up call” for my lambdas which is actually exactly what it sounds like. Every 25 minutes or so I make a call to each lambda to keep it “warm” thus avoiding the cold starts. And before you ask, yes, this will end up costing more but I believe it’s a small price to pay for having a simple way to shut the naysayers’ mouths that complain about cold starts like it’s the worst thing to ever happen to software development, to which I’m always like “Hello, the worst thing to happen to software development is PHP!”

But I digress, the important thing to note here is that cold starts are a necessary evil, and if you absolutely, positively can’t have that 5 second delay to your calls you can always create a simple “wake-up call” function that will call one, two or all your function after a period of time, ensuring your calls will never go cold. And while you get an increase in cost let me put that cost in perspective. If you call a function every 25 minutes you will eventually have to make 1800 calls a month. Amazon gives you 1 million calls for free every month. And if the million calls isn’t enough, the cost for the wake up calls for each function is $0.00036 per function.

We aim to improve Dashbird every day and user feedback is extremely important for that, so please let us know if you have any feedback about these improvements and new features! We would really appreciate it!

Read our blog

Making serverless applications reliable and bug-free

In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.

ANNOUNCEMENT: new pricing and the end of free tier

Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.

4 Tips for AWS Lambda Performance Optimization

In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.

Made by developers for developers

Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.

Get started free or learn more

What our customers say

Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.

Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.

Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.

I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.

Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.

Great UI. Easy to navigate through CloudWatch logs. Simple setup.

Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.