Dashbird supports full-fledged and customizable failure detection and incident management for serverless applications. Incidents are divided into two main sub-categories: execution errors and metric condition failures. This article is about error alerting. 

Execution errors are runtime exceptions, timeouts, crashes, out of memory errors and configuration errors.

Dashbird automatically scans all function invocations for failures. Failures can be any of the following types: CRASHTIMEOUTOUT OF MEMORYCONFIGURATION ERROREARLY EXIT. Dashbird failure detection supports all programming languages that are supported by Lambda (Node.js, Python, Java, C#, Go, and Ruby).

To view all errors, click here or on the bug icon on the left menu.

Error states

Resolving errors (button: ) – after fixing an error in your code, don’t forget to mark it as resolved in Dashbird. Otherwise, you will not be notified the next time when that error occurs. You can resolve errors from the top right corner and view resolved errors under the “RESOLVED” tab.

Mute errors (button: ) – if an error is unimportant for you, you can mute notifications for it and discard it from the active errors list.

Configuring an alert policy

You have complete control over which errors get reported to you. To configure alerting rules, it’s necessary to create an alert policy. Navigate to the Policies tab and click + ADD. A new policy will be added to the list and you can start adding rules.

All policies must have at least one notification channel and one alert condition. A notification channel can be a slack channel or an email address. An alert condition consists of an error condition and a selection of functions.

For example, you can define an alert for any error overall functions, or for example only alert on invocation timeouts for a specific function or microservice.

Best practices for handling alerts

Alerting should be an ongoing process, we recommend testing different policies, adding new ones based on needs, and deleting/muting unnecessary alerts to avoid alarm fatigue.

  • Set alerts for all production lambdas. Even if you think they’ll never fail, unexpected circumstances do sometimes happen.
  • Always resolve alerts after you’ve fixed them in code. This way, if the problem reoccurs, you’ll be notified again.

 

Can’t find what you’re looking for? We’d love to help. Send us a message through the chat bubble or email us.

No results found