Monitoring vs observability

Taavi Rehemägi

August 28th, 2018

With the popularity of serverless, there has been a huge amount of debate in regards to observability or the lack thereof(within the serverless space), and more often than not, the serverless fanboys (a group of highly intellectual individuals of which I am proud to be part of) have had the same answer to everything: third-party monitoring tools that does this or that. But you have to understand that you can’t patch the observability limitation by throwing graphs and alerts at the situation.

Monitoring and observability are two different things altogether.

Allow me to explain by using a service we’re all too familiar with, Twitter. As you might imagine a product like Twitter has a LOT of moving parts and when something breaks down it can be difficult to understand why or what caused said problem. Imagine having 350 million active users that interact with each other through your system, tweeting, liking, dm-ing, retweeting, and whatever else you can do in the platform. That’s a lot of information to follow and if you’ve ever worked on a platform this size you can imagine the kind of effort it would take to figure out why a tweet isn’t posted or a message takes too long to be delivered. Before they made the switch from monolithic application to a distributed system finding out why something doesn’t work was, at times, as simple as opening an error log file and seeing what went wrong.

When you have hundreds maybe thousands of small services communicating asynchronously with each other, saying that debugging a simple thing like a tweet not firing would be hard is a complete understatement. They’ve posted a really cool posted about their migration to microservices in 2013. Read the post here.

A graph illustrating the Twitter microservicesSo back to my point. Baron Schwartz has put it best.

“Monitoring tells you whether the system works. Observability lets you ask why it’s not working.”

Monitoring an app will get you information about your system and let you know in the event of a failure while Observability is a quality of your app or technology you are using that grants an easy way of seeing what and where it broke.

We know observability isn’t monitoring. What is it then?

Observability is a system attribute that, as [Anthony Asta](https://twitter.com/anthonyjasta) – Engineering Manager @Twitter – puts it, is composed of four pillars:

Monitoring
Alerting/visualization
Distributed systems tracing infrastructure
Log aggregation/analytics

With distributed systems (read microservices), especially at scale, having observability into your platform is more than a necessity, it’s a requirement that can’t be circumvented by using only alerting or by only looking at logs. You need an environment that provides visibility to a microscopic level in order to have the right information on which to act upon.

To continue using our Twitter example, their observability system is humongous and took years to develop into the well-oiled machine it is today.

“Our time series metric ingestion service handles more than 2.8 billion write requests per minute, stores 4.5 petabytes of time series data, and handles 25,000 query requests per minute” – Antony Asta on the scope of their observability systems published in 2016 in a two-parter that covers architecture, metrics ingestion, time series database, and indexing services. Check out part one and part two.

Here are my two cents, you get observability in your application by knitting together monitoring with alerting while having a clear debugging solution that provides clarity for your data. Missing just one of this aspects will leave you at a great disadvantage, chasing your tail trying to figure out what went wrong within your app. It’s not enough to be notified everytime it something breaks down, neither is having the insight of knowing when something is about too. You have to be able to pinpoint the issue within your platform efficiently.

Conclusion

I’d like to end with a simple statement that reflects my thoughts on the whole subject. Observability in the serverless space is non-negotiable. You have got to have it and it’s not a quantifiable attribute, meaning you can’t have a little observability or too much of it. You either do or don’t.

Dashbird aims to help you solve that issue by providing insights into your application and while it’s not a universal observability solution that works for everyone, it does offer a lot of benefits like failure detection, analytics, visibility, cost analysis, cold starts information, alerting and a lot more features. And the best part, it’s free!

Read our blog

Making serverless applications reliable and bug-free

In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.

ANNOUNCEMENT: new pricing and the end of free tier

Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.

4 Tips for AWS Lambda Performance Optimization

In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.

Made by developers for developers

Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.

Get started free or learn more

What our customers say

Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.

Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.

Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.

I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.

Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.

Great UI. Easy to navigate through CloudWatch logs. Simple setup.

Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.