How Professional Serverless Teams Manage Software Issues

Taavi Rehemägi

March 30th, 2020

No matter how careful developers are or how comprehensive tests are applied before deployment, there will always be some level of issues to deal with in production.

When it comes to managing issues and ensuring application quality, two main metrics should be on our radar: time to discover and time to resolve issues.

As the names suggest, the first indicates how long it takes for the development team to discover that something went wrong, and the second shows how long it usually takes to debug and fix the source of the error.

Both times should be as low as possible in order to guarantee the best experience for the end-user. Below are the main aspects impacting these times and tips to help you along the way:

Time to Discover

Detection and Awareness

Although it’s obvious that being able to detect issues is fundamental to dealing with it, you would be impressed by how many bugs are probably happening in your application right now without you knowing it.

This is a typical occurrence for teams who are still using old-school monitoring for modern distributed cloud infrastructure. Many solely rely on AWS CloudWatch for monitoring Lambda functions, for example, which has several limitations, including the inability to uncover issues that your team should be taking care of.

Proactive Alerting

A key to reducing time to discover issues is a proactive alerting system. It makes no sense to regularly search your application logs looking for potential problems. That is where failure detection algorithms come in.

Having an automated monitoring system detecting issues for you, allows it to send your team an alert within a couple of seconds. Usually, you will be able to choose whether you want to receive those alerts by e-mail or a Slack channel, for instance.

Time to Resolve

Precision in Accessing Logs

In a traditional server-based infrastructure, one server or container serves multiple unrelated requests simultaneously. Isolating logs to debugging purposes is difficult.

Many monitoring systems follow the same approach for Serverless functions. In CloudWatch, for instance, hundreds or even thousands of Lambda invocations may be mixed together in a single log stream.

Modern approaches, such as the one used in Dashbird, aim to isolate logs for each request. When developers need to debug, they will find the data they need well organized and easy to browse and read, all in one place. This can save numerous hours in development time and reduce the time to resolution of issues, as we have discovered from our own customers.

Tracing

Lastly, tracing is about profiling all interactions with external resources out of a serverless function invocation process. This would include database queries, third-party APIs, etc.

Sometimes, the source of an issue may be in a bad performing IO-bound operation, for example, such as retrieving an object blob from S3 or information from a database. In these cases, only looking at AWS Lambda logs will not help. That’s where tracing comes in to fill the missing gaps in information for a fast and precise debugging experience.

Being able to consume logs in connection with this rich tracing profiling is key for developers to identify the root causes of issues and work on solutions as quickly as possible.

Dashbird provides all of these benefits and much more, including a 14-day free trial, no credit card required. Click here to professionalize your issue management processes today.

Read our blog

Making serverless applications reliable and bug-free

In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.

ANNOUNCEMENT: new pricing and the end of free tier

Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.

4 Tips for AWS Lambda Performance Optimization

In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.

Made by developers for developers

Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.

Get started free or learn more

What our customers say

Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.

Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.

Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.

I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.

Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.

Great UI. Easy to navigate through CloudWatch logs. Simple setup.

Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.