Monitoring platform for keeping systems up and running at all times.
Full stack visibility across the entire stack.
Detect and resolve any incident in record time.
Conform to industry best practices.
A serverless application in its infancy looks and runs vastly different to one at scale. When there are more components to manage, the key to operational excellence is rooted in serverless best practices. Dashbird was created with the mission to help developers succeed with modern cloud environments, no matter their size. As experienced developers ourselves, we’ve faced and understand the challenges found in the functionality of at-scale serverless architecture. In this article, we run through the common serverless challenges, the architectural patterns and best practices to combat them.
Find out more about scalable serverless designs for enterprises.
As with anything, we should be constantly aspiring to catch problems sooner rather than later. Here is an example of an established but early-stage serverless application:
As you can see, its workflow is simple and there is minimal load meaning the requests, execution times and concurrency are manageable.
In just a few months, that same architecture can look like this:
As load increases, the existing infrastructure comes under stress. This is a great exercise in identifying the potential points of failures in your system, and the scenarios in which those could happen. In this example, you can see clearly how each source has its own limit leading to either failure or performance degradation. It’s important to remember that while different services have different API limits and throttling limits, failures can also happen through configuration mistakes and code errors.
Common issues at higher loads:
Lambda concurrency is the number of requests that your function serves at any time. A good formula for estimated Lambda concurrency is:
Average Execution Time * Average Request Per Second = Estimate Concurrency
This helps to determine the number of containers that’ll be used simultaneously. With this in mind, let’s remind ourselves of some default AWS limits in place.
These can still occur even when concurrency is running fine. There is between a 500-3000 initial burst limit on functions (region dependent) with the ability to scale up by 500 every one minute.
These are soft limits and built-in for your protection. By default, it’s set to 1000 concurrent executions, however these can be changed.
There is a 10k request per second limit, per region which can be increased as needed. However, the 5k concurrency burst limit and 29-second timeout lime cannot be changed.
All AWS APIs have limits, which is important to factor in when building and mapping out your application for scale. For example, KMS has a limit between 5,500-10,000 requests per second, depending on the region.
As your application scales or if it often experiences spikey loads, these limits need to be kept in mind for stable performance.
An unoptimized at-scale serverless application would look like this:
With so many requests per second, the stress becomes clear as other resources multiply. For a relational database, 3,000 new connections per second is a huge load and can cause lag in your system. Additionally, the 7,500 containers now needed increases your costs significantly.
These are our top tips for code-level optimizations to help with this.
Using the above, the optimized at-scale serverless architecture now looks like this:
You can see a huge reduction in the execution time, as the connection doesn’t need to be established and the total connections resulting in a far smoother performance.
A habit we can fall into is always having a detailed database response from the API, when sometimes a simple acknowledgment is all that’s needed. By doing this, you can decouple the database from the KMS request and create an asynchronous processing model using SQS and Lambda, allowing you to set your concurrency limit and the load. There is no change to the model.
If an API response is needed, there are few optimization tweaks to consider.
*These features may have a negative impact for the client, however at a very high scale, the compromise can be worthwhile.
The purpose of serverless is to keep code focused on business logic, meaning that elements of your serverless application of undifferentiated value can use managed services. Make use of the best services to support your application’s functionality.
Additionally don’t wait in code, and instead, use Step Functions to enable tasks to be run in parallel and enable automatic triggers and retries. This is one of the best optimization actions many of our customers have seen from both a performance perspective and a reduction of costs.
With the benefits of serverless, comes a new host of monitoring challenges to overcome, which is where Dashbird can provide value and expertise.
Dashbird is built on three core pillars that target all these issues:
It’s important to make the already available mass of data output work efficiently for us. Democratizing data breaks down traditional silos and enables users to navigate their own data more easily through customizable queries and searches. Dashbird’s use of prebuilt views and simple dashboard offering visualization of your data, for easier and quicker understanding.
The centralized platform offers dynamic resource management, where you’re able to understand resource relationships and view your entire application in one place.
Monitoring is only effective if there is continuous alert coverage across your entire infrastructure. Dashbird uses out-of-the-box automated alerts notifying you of failures and errors, which integrate seamlessly into a developer’s workflow by sending in real-time via Slack or email.
Dashbird also proactively listens to log and metric data meaning that any potential negative trails (not yet failures) are highlighted and can be investigated before they escalate.
Building serverless applications requires consistent best practice habits, which can be difficult to maintain or even start. Using the AWS Well-Architected lens, Dashbird helps to ensure your system is built and fixed based on industry-standard best practices.
The Insights Engine detects non-binary issues such as delays, consumption issues, or limits enabling users to take action and improve and optimize their architecture to be reliable at any scale. Within its periodic assessments, Dashbird also helps to instill strong security and compliance practices, discovering areas needing encryption, inactive resources and over- or under-provisioned components all of which can be increasing exposure for attacks.
In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.
Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.
In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.
Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.
Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.
Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.
Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.
I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.
Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.
Great UI. Easy to navigate through CloudWatch logs. Simple setup.
Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.