Monitoring platform for keeping systems up and running at all times.
Full stack visibility across the entire stack.
Detect and resolve any incident in record time.
Conform to industry best practices.
Dashbird continuously monitors and analyses your serverless applications to ensure reliability, cost and performance optimisation and alignment with the Well Architected Framework.
What defines a serverless system, main characteristics and how it operates
What are the types of serverless systems for computing, storage, queue processing, etc.
What are the challenges of serverless infrastructures and how to overcome them?
How systems can be reliable and the importance to cloud applications
What is a scalable system and how to handle increasing loads
Making systems easy to operate, manage and evolve
Learn the three basic concepts to build scalable and maintainable applications on serverless backends
The pros and cons of each architecture and insights to choose the best option for your projects
Battle-tested serverless patterns to make sure your cloud architecture is ready to production use
Strategies to compose functions into flexible, scalable and maintainable systems
Achieving loosely-coupled architectures with the asynchronous messaging pattern
Using message queues to manage task processing asynchronously
Asynchronous message and task processing with Pub/Sub
A software pattern to control workflows and state transitions on complex processes
The strategy and practical considerations about AWS physical infrastructure
How cloud resources are identified across the AWS stack
What makes up a Lambda function?
What is AWS Lambda and how it works
Suitable use cases and advantages of using AWS Lambda
How much AWS Lambda costs, pricing model structure and how to save money on Lambda workloads
Learn the main pros/cons of AWS Lambda, and how to solve the FaaS development challenges
Main aspects of the Lambda architecture that impact application development
Quick guide for Lambda applications in Nodejs, Python, Ruby, Java, Go, C# / .NET
Different ways of invoking a Lambda function and integrating to other services
Building fault-tolerant serverless functions with AWS Lambda
Understand how Lambda scales and deals with concurrency
How to use Provisioned Concurrency to reduce function latency and improve overall performance
What are Lambda Layers and how to use them
What are cold starts, why they happen and what to do about them
Understand the Lambda retry mechanism and how functions should be designed
Managing AWS Lambda versions and aliases
How to best allocate resources and improve Lambda performance
What is DynamoDB, how it works and the main concepts of its data model
How much DynamoDB costs and its different pricing models
Query and Scan operations and how to access data on DynamoDB
Alternative indexing methods for flexible data access patterns
How to organize information and leverage DynamoDB features for advanced ways of accessing data
Different models for throughput capacity allocation and optimization in DynamoDB
Comparing NoSQL databases: DynamoDB and Mongo
Comparing managed database services: DynamoDB vs. Mongo Atlas
How does an API gateway work and what are some of the most common usecases
Learn what are the benefits or drawbacks of using APIGateway
Picking the correct one API Gateway service provider can be difficult
Types of possible errors in an AWS Lambda function and how to handle them
Best practices for what to log in an AWS Lambda function
How to log objects and classes from the Lambda application code
Program a proactive alerting system to stay on top of the serverless stack
Although serverless is very stable and reliable, many things can go wrong with our software. A hardware failure may cause glitches, network instability can disrupt API communications, and the application itself can present unexpected bugs.
AWS Lambda, for example, has three types of errors, as discussed in Lambda: Invocation, Function and Runtime Errors. Since developers don’t have access to the underlying infrastructure in serverless systems, logs are usually piped to a central repository (e.g. AWS CloudWatch Logs1).
In some cases, errors are just returned from API calls. It is the application responsibility to handle and log these error messages, otherwise it will be impossible to detect and inspect them later.
Take DynamoDB and its capacity modes2. If the number of queries gets too high and above the database capacity, it will return a Provisioned Throughput Exceeded Exception. When a Lambda function is querying, it must log the error, which is going to be stored in CloudWatch Logs as well.
Provisioned Throughput Exceeded Exception
Failure detection is the process of inspecting logs and identifying all strings and patterns that indicate whether an error occurred in an application.
Logging errors is only the first step. Even for small applications, the amount of log information can easily become impossible for humans to parse. This is when a failure detection algorithm is valuable.
Such an algorithm will automatically identify a DynamoDB error, a Python or Node.js exception (e.g. TypeError) or an AWS Lambda misconfiguration, for example.
TypeError
Waiting for a customer or manager to discover an error and report to the development team can erode trust in the application. Much better is when developers are the first to learn about the failure and can proactively provide notification, or perhaps even a quick fix.
For that reason, professional development teams must implement an error alerting mechanism coupled with the failure detection algorithms. Whenever something fails, the system should alert the responsible development team at the most convenient channels (e.g. Email, Slack).
Traditional logging and monitoring services from big, classic companies or open source projects won’t work with serverless. Simply because serverless is not a traditional architecture. It requires specialized error inspection and alerting.
AWS CloudWatch Logs, for example, does not provide failure detection algorithms for serverless, nor alerting mechanisms. It can serve only as a great log central repository.
Having a dev team implementing its own failure detection and alerting system would be a waste of time. There are professional services tailored for serverless – such as Dashbird – that can provide a much better solution at a fraction of the internal development costs.
Dashbird works by reading logs from CloudWatch Logs. It is an asynchronous way of inspecting failures that do not require code instrumentation, nor interferes or degrades the application performance. Please check How Dashbird Works for more information.
The best professional monitoring platforms will provide a failure management system as well. It helps to organize failures that are pending, resolve errors that have been fixed, etc.
In Dashbird, for example, errors are tracked in different states: open, resolved, or muted.
Another benefit is setting up custom alerting policies. Developers can control which AWS Lambda functions to monitor, for example. Perhaps testing and experimental functions can be ignored, but production ones must be monitored closely.
The screenshot below illustrates a policy to monitor timeout failures in a given Lambda functions.
The Dashbird documentation provides more details on the Enabling Error Alerting features and Notification Channels.
No results found