How Dashbird innovates serverless monitoring

Taavi Rehemägi

January 18th, 2021

At first glance, all serverless monitoring services seem similar and aim to solve the same problems. However, in Dashbird, we have made decisions that fundamentally differentiate us from our competitors since day one. Over time, those differences have magnified and we have found increasing confirmation and confidence in our approach.

Dashbird product strategy is based on three core pillars. According to our customers, it makes up the most complete and compelling serverless monitoring offering in the market.

The three pillars of Dashbird are:

Centralization of distributed data
Automation of alerts
Continuous Well-Architected insights (we’ll go into each one later)

Additionally, Dashbird is the only serverless monitoring service that does not instrument Lambda functions.

Below, I will walk you through the decisions, benefits, and strategy behind our platform and the fundamental idea of what a good serverless monitoring approach should consist of in this day and age.

Focus on centralisation, instead of tracing

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
Observability in Wikipedia

The core idea of Dashbird is that we can determine the inner state of an application by collecting, correlating, and analyzing already available system outputs (logs, metrics, (X-ray) traces, and resource configurations. Currently, we integrate with seven AWS services (AWS Lambda, API Gateway, DynamoDB, Step Functions, SQS, Kinesis, and ECS) with a total setup time of less than two minutes (deploying a simple CloudFormation template to your AWS).

Virtually all other serverless monitoring services instrument functions or other compute resources, collecting telemetry during execution time. We believe, this is not the optimal approach in serverless and the perfect solution lies outside of low-level information gathering.

This opinion is enforced by the learnings we’ve obtained by building large-scale serverless applications ourselves and from speaking to thousands(!!) of serverless users over the years.

In serverless, the complexity shifts from code-level into orchestration level, reducing the importance of getting granular function execution details (tracing). At the same time, application logic is now distributed across hundreds of moving parts, increasing cognitive overload and the amount of available data.
AWS Lambda is a big part of serverless, but other services introduce risks and challenges too. What all serverless services have in common is the limited code access while providing logs, metrics, and traces in a predefined format.
The true effectiveness of an engineering team is dictated by its ability to access, interrogate, and understand operational data.

Abstract and automate failure detection across the stack

The biggest a-ha moment so far has been when we launched an error detection feature from CloudWatch logs. For Dashbird users, that means that the moment they onboard they’ll immediately be able to reduce their mean time to discovery (MTTR) by up to 80%. From our own experience and from talking to customers, finding a failing function, or other resources, out of 100s of resources is a daunting and challenging task for most teams.

This is why the second pillar of Dashbird is that we continuously analyze every log line and every metric across the system, and apply prebuilt alarms and filters, ready to detect a single point of failure amongst thousands of signals. For transparency, we have also published the library of our alarms here.

Prebuilt alarms and insights in Dashbird.

There are multiple reasons why this is especially valuable for our customers:

It is very hard to map out all the possible known and unknown failure scenarios across the stack. Dashbird’s value is in the research and analysis we have done for those services and offering all of those alarms by default to all of our customers.
Manually managing log filters and metric alarms across the stack is a lot of work, and can also be really expensive.

Continuous insights towards the Well-Architected Framework and best practices

Adopting serverless assumes using a variety of different managed cloud services and educating the whole team in best practices and ways to use those services. There are two fundamental challenges that organizations struggle with when building and operating a serverless stack.

Therefore, the third pillar of Dashbird is all about:

Automatically aligning the stack with community best practices.
Educating engineering teams about the best practices and optimal settings and patterns of serverless.

Insights library for API Gateway Insights.

The approach of collecting and analysing all types of data about the infrastructure landed us in a great position to build a collection of checks that identify reliability, security, performance and cost optimization, and operational excellence insights. Today, this Well-Architected insights engine features over 70 checks, ran continuously from 5 minutes to 1-day intervals.

From a users’ perspective

Combining those three pillars gives teams an end-to-end experience to their modern cloud stack via:

Increased service quality and reduced risk of incidents. This is driven by Dashbird drastically reducing the time to discovery for most incidents in the cloud.
Time back to developers to focus on the product and customer. When operating a serverless application in production, you are going to have to set alarms, build dashboards, and make monitoring data consumable. Dashbird takes away the heavy lifting and undifferentiated work of that.
Posture and best practices management. Users of Dashbird achieve better performance, cost, reliability, and security of their cloud environment with significantly less effort.

Five years from now

The seismic shift in cloud computing will be the adoption of single-purpose, managed cloud services, that enable engineers not to focus on creating undifferentiated value but focus most of their time and effort on creating value special to their organization. Over time, the importance of computing will deteriorate in favor of managed services. Applications will be built out of lego-blocks offered by cloud providers and third-party vendors.

Dashbird will be a centralization platform that does not just ingest and centralize operational data from popular managed services but transforms that data into universally understandable insights. Those three pillars of Dashbird mentioned above will fundamentally transcend to all managed services, reducing the barrier to entry and increasing the simplicity of serverless monitoring, operating, and scaling.

Read our blog

Making serverless applications reliable and bug-free

In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.

ANNOUNCEMENT: new pricing and the end of free tier

Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.

4 Tips for AWS Lambda Performance Optimization

In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.

Made by developers for developers

Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.

Get started free or learn more

What our customers say

Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.

Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.

Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.

I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.

Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.

Great UI. Easy to navigate through CloudWatch logs. Simple setup.

Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.