Enable alerting

Dashbird streams and scans all serverless logs in real-time, detecting code exceptions, timeouts, out-of-memory, cold starts, and other errors and anomalies.

Proactive alerts are sent by email, Slack, webhooks, and SNS immediately when issues are detected in your serverless stack. Incidents are divided into two main sub-categories: execution errors and metric condition failures.

With Dashbird, alarms and checks are centralized into one platform. To make it even easier, alarms can be set for any metric condition for the delegated infrastructure. For example Lambda errors such as:

  1. CRASH, 
  2. TIMEOUT, 
  3. OUT OF MEMORY, 
  4. CONFIGURATION ERROR,
  5. EARLY EXIT.

and limits, such as:

  • timeout
  • out-of-memory error, etc.

Dashbird detects all types of application errors and exceptions, in every runtime supported by AWS Lambda: NodeJS, Python, Java, Ruby, Go, .NET.

Alarms are fully customizable and can be configured for metrics related to a specific type of resource. For example, SQS queues are checked for a growing number of pending messages, DynamoDB tables have throttling and resource capacity consumption verified, and ECS containers have resource-usage tracked.

To learn more about the type of alarms and events supported by Dashbird see our Events Library.

Alarms

The Alarms service in Dashbird is used to set custom alarms for each resource type supported by Dashbird. Each service type has its own metrics that can be used to set up custom alarms.

Dashbird alarms

For instance, for Lambda functions, we can create two types of alarms, critical or warning. Next, we select the metric on which we want the alarm to trigger. This could be:

  1. errors/cold starts/retry/invocations count, 
  2. throttled count, 
  3. execution or billed duration, 
  4. memory used, 
  5. cost incurred, 
  6. or concurrent lambda executions. 

After selecting the metric we specify when the alarm triggers i.e. when the selected metric is above/below a specific threshold(5,10,20) on average/max/min/sum for the duration of time in minutes, hours or days. 

Finally, we select one or more target resources for which we want the alarm to be set. The created alarm is something like: trigger alarm for lambda function when the error count is above 10 on average over the last 15 minutes.

Just like for lambda functions, alarms for each service can be created specifically to its alarms.

When an alarm is triggered notifications are dispatched for each notification channel that has been set up. Currently, supported notification channels are:

  1. email, 
  2. Slack, 
  3. webhooks, 
  4. SNS

Best practices for handling alerts 

Alerting should be an ongoing process, and it is recommended to set up different alarms, add new ones based on needs, and delete/mute unnecessary alerts to avoid alarm fatigue.

  • Set alerts for all production lambdas. Even if you think they’ll never fail, unexpected circumstances do sometimes happen.
  • Always resolve alerts after you’ve fixed them in code. This way, if the problem reoccurs, you’ll be notified again.

To view all errors, click here or on the bug icon in the menu bar on Dashbird app.

 

Can’t find what you’re looking for? We’d love to help. Send us a message through the chat bubble or email us.

No results found