Start free trial

Central data platform for your serverless environment.

Get full access to all premium features for 14 days. No code changes and no credit card required.

Password: 8+ characters, at least one upper case letter, one lower case letter, and one numeric digit

By signing up, you agree to our Privacy policy and
Terms and Conditions.

Passing the “Is it Working?” Test with Serverless Architectures Is Not Enough

Share

This post is published with the author’s, Paul Singman’s, approval. Original post here.

Setting the Scene

Say you are an awesome developer sitting contentedly at your desk when a Slack message suddenly interrupts your peaceful mental flow:

Image for post

It would appear there is a data issue with the new Activity History service released last month… Or at least a couple people think there is.

Now, instead of making progress on new tasks, you now need to drop those and look into what’s happening here.

Sigh.

Setting up the Problem

What this Activity History service does is calculate and then expose counts of how many times users have used the company’s application.

If we’re Netflix, it’s how many episodes a user’s watched. If we’re Spotify, perhaps this powers their popular Year In Review feature that shows how many minutes you’ve listened this year. [Answer: A lot.]

It is powered by a modern, serverless pipeline built on AWS with an architecture that looks like:

Image for post

The way this works is user activity gets POSTed to an ingestion API Gateway endpoint. Backing the API is a Lambda function that writes the data to a Kinesis Data Stream for temporary storage. Next, a Lambda function invokes to validate the schema of the ingested data. And if it looks good, it is written to a Dynamo table that holds the activity events for all users.

Finally, we have an API Gateway endpoint backed by a Lambda that is responsible for fetching and aggregating records for a user to be shown on an Activity History screen in the mobile app.

In my experience, this is a typical serverless architecture for an app that contains these types of features.

Anyway, to debug such a system, and respond confidently to the inquiry from Slack, there a number of things we should check:

  1. Is the API endpoint working and if so, what value does it return for this user?
  2. Is the Lambda function backing the API returning successfully?
  3. What value is stored in the DynamoDB table for this user?
  4. Is the Lambda function that validates data and writes to Dynamo experiencing any issues?
  5. How is the performance of the Kinesis Data Stream that triggers the Lambda?
  6. And are there any errors or latency in the Lambda ingesting data and writing to Kinesis?

That’s a lot to do, no?

Make no mistake…

AWS deserves praise for creating the services that enable such functionality to be possible in the first place. However, we can also admit that the out-of-the-box monitoring tools like CloudWatch Logs and Metrics don’t make debugging tasks like the one delineated above easy.

Image for post
If your browser tabs look like this, you’re not doing it right.

And speaking personally, having built and maintained serverless architectures over the last several years, it is crucial to be able to debug them quickly. At least if you want to take full advantage of the fast development speed serverless can promise and not spend most of your time looking into problems that may arise.

What’s the Solution?

The issue is that for each debugging step, there’s an isolated log group or metric graph to inspect, and frankly you’ll drive yourself crazy trying to pull up each one in a separate browser tab to identify the location of the issue.

A better approach would be to have access to a single centralized location from which you get a pulse on the recent behavior of all AWS resources. This allows you to narrow the scope of your investigation into the problem.

As with everything, this is either something you can build yourself or see if a solution exists to buy. After looking into what’s out there, I ended up hooking up to a serverless monitoring and intelligence tool Dashbird to my AWS account.

Image for post

Within 5 minutes a whole host of new functionalities and insights were unlocked about my serverless resources!

Now in a single browser tab, any issues that arise across our example Activity History Pipeline become visible. On the Dashbird Insights tab, we can see ConnectionErrors occurring in the Ingestion Lambda. And now we can dig further into the Logs and Performance of that Lambda function specifically, and in a short time triage the issue raised at the beginning of the article.

Whether the source of the error been in an API Gateway endpoint, Kinesis Data Stream, or DynamoDB table, Dashbird also contains dashboards showing problematic behavior in any of these services.

Image for post
Total Invocations, Invocation Errors, or Average Duration metrics on a single pane.

The Moral of the Story

If you are a developer of an application or data pipeline using serverless architectures, it can be exciting to get your first project up and running. The beauty of the modern cloud is how you can stitch together resources like an artist to achieve functionality in a seamless, integrated way.

In some sense, we’ve reached cloud-computing nirvana with Function-as-a-Service offerings like AWS Lambda that integrate with countless others and are billed one a 1 millisecond basis.

If this is a development pattern you want to take full advantage of beyond an initial single process, it is imperative to extend your monitoring capabilities beyond the AWS defaults.

No matter how good you get at querying CloudWatch Logs, or viewing Lambda Metric Graphs, or managing capacity on a Dynamo Table — when the number of resources under your watch grows into the tens or even hundreds — the speed at which you can fix errors in the system will increase to untenable levels.

And what seemed like fun at the beginning, will become a headache until finally you wake up one morning and admit, “There must be a better way!”

If you find yourself spending more and more of your time looking into the performance of your serverless pipelines — or worse, not checking them at all — I recommend integrating with Dashbird or a similar tool.

For Dashbird specifically, you can learn more about their services on their website. If you come across other solutions in this space, I’m interested to hear about them as well!

Made by Developers for Developers

Our history and present are deeply rooted in building large-scale cloud applications on state-of-the-art technology. Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.

10,000+ developers trust Dashbird

Dashbird helped us reduce the time to discovery from 2-4 hours to a matter of seconds. This means we’re saving dozens of hours in developer time, which not only can be funneled into improving the product but also means hundreds of dollars are saved every month.

Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.

Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.

Great UI. Easy to navigate through CloudWatch logs. Simple setup.

Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.

Read our blog

AWS Step Functions Input and Output Manipulation Handbook

In this handbook, you’ll learn how to manipulate AWS Step Functions Input and Output and filter this data.

How to Save Hundreds of Hours on Lambda Debugging

Learn simple ways to save a ton of time when scanning logs to debug errors in your Lambda functions.

Why Are Some Engineers Missing The Point of Serverless?

Why are some engineers missing the point of serverless? Let’s have a look at the common critique points, benefits, drawbacks of serverless, and if it makes sense for your use case.

How Dashbird innovates serverless monitoring

What makes an effective serverless monitoring strategy? In this article, we’re discussing the three core ideas that Dashbird’s serverless monitoring tool was built on top and that should be the fundamentals of any effective serverless monitoring approach.

Debugging with Dashbird: Malformed Lambda Proxy Response

A problem that pops up quite frequently when people try to build serverless applications with AWS API Gateway and AWS Lambda is: Execution failed due to configuration error: Malformed Lambda proxy response.Learn what causes it and how to fix it.

Go to blog