Monitoring platform for keeping systems up and running at all times.
Full stack visibility across the entire stack.
Detect and resolve any incident in record time.
Conform to industry best practices.
Dashbird continuously monitors and analyses your serverless applications to ensure reliability, cost and performance optimisation and alignment with the Well Architected Framework.
What defines a serverless system, main characteristics and how it operates
What are the types of serverless systems for computing, storage, queue processing, etc.
What are the challenges of serverless infrastructures and how to overcome them?
How systems can be reliable and the importance to cloud applications
What is a scalable system and how to handle increasing loads
Making systems easy to operate, manage and evolve
Learn the three basic concepts to build scalable and maintainable applications on serverless backends
The pros and cons of each architecture and insights to choose the best option for your projects
Battle-tested serverless patterns to make sure your cloud architecture is ready to production use
Strategies to compose functions into flexible, scalable and maintainable systems
Achieving loosely-coupled architectures with the asynchronous messaging pattern
Using message queues to manage task processing asynchronously
Asynchronous message and task processing with Pub/Sub
A software pattern to control workflows and state transitions on complex processes
The strategy and practical considerations about AWS physical infrastructure
How cloud resources are identified across the AWS stack
What makes up a Lambda function?
What is AWS Lambda and how it works
Suitable use cases and advantages of using AWS Lambda
How much AWS Lambda costs, pricing model structure and how to save money on Lambda workloads
Learn the main pros/cons of AWS Lambda, and how to solve the FaaS development challenges
Main aspects of the Lambda architecture that impact application development
Quick guide for Lambda applications in Nodejs, Python, Ruby, Java, Go, C# / .NET
Different ways of invoking a Lambda function and integrating to other services
Building fault-tolerant serverless functions with AWS Lambda
Understand how Lambda scales and deals with concurrency
How to use Provisioned Concurrency to reduce function latency and improve overall performance
What are Lambda Layers and how to use them
What are cold starts, why they happen and what to do about them
Understand the Lambda retry mechanism and how functions should be designed
Managing AWS Lambda versions and aliases
How to best allocate resources and improve Lambda performance
What is DynamoDB, how it works and the main concepts of its data model
How much DynamoDB costs and its different pricing models
Query and Scan operations and how to access data on DynamoDB
Alternative indexing methods for flexible data access patterns
How to organize information and leverage DynamoDB features for advanced ways of accessing data
Different models for throughput capacity allocation and optimization in DynamoDB
Comparing NoSQL databases: DynamoDB and Mongo
Comparing managed database services: DynamoDB vs. Mongo Atlas
How does an API gateway work and what are some of the most common usecases
Learn what are the benefits or drawbacks of using APIGateway
Picking the correct one API Gateway service provider can be difficult
Types of possible errors in an AWS Lambda function and how to handle them
Best practices for what to log in an AWS Lambda function
How to log objects and classes from the Lambda application code
Program a proactive alerting system to stay on top of the serverless stack
A scalable system is one that can continue to perform in a reliable manner under variable and often increasing levels of load.
A system’s scalability is rarely a single variable analysis. It usually involves at least a two-dimensional problem: a load metric and time.
What developers first need to do is expressing what load means for each of their systems.
Load could mean something different for each type of system. For a website, it can be visitors or pageviews per second. For a database, it could be concurrent queries, number of IO operations, or amount of data getting in and out of the database servers.
How load is described will also depend on the system architecture.
In an e-commerce website, for example, the system may scale to serve 100,000 people shopping at the same time across a thousand-item catalog. But what happens if 20% of those are shopping a single item?
This is the sort of circumstance that happens due to market trends and human behavior. Developers must account for these factors when thinking about load.
The more developers strive to anticipate possible challenging load scenarios for the system, the better it will behave in reality.
It is necessary to consider:
Resources can scale:
A great number of healthy architecture will mix both approaches. Sometimes, having many small servers is cheaper than a few high-end machines, especially for highly variable loads. Large machines can lead to increased over-provisioning and wasted idle resources. In other cases, perhaps a big machine would perform faster and cheaper than a cluster.
It really depends on the case and developers must try different approaches to find one that suits both performance requirements and project budget.
Using serverless systems greatly simplifies the level of responsibility developers have over how systems cope with load. These services abstracts away decision-making about scaling-up or out, for example, and also provide SLAs that the development team can rely on.
Metrics will need some sort of aggregation or statistical representation. Average (arithmetical mean) is usually a bad way to represent metrics, because they can be misleading. It doesn’t tell how many users actually experienced that level of performance. In reality, no user might have experienced it at all.
Consider the following application load and user base:
The average response time would be 180 ms. But no user actually experienced that response time. In fact, 75% of the users experienced a performance that is worse than average. Arithmetic mean is highly sensitive to outliers, which is the case of the distribution above.
This is why percentiles are more commonly used to express systems performance. They are also the basis for service level objectives (SLOs)2 and aggrements (SLAs)3.
The most common percentiles are 95th, 99th and 99.9th (also often referred to as p95, p99 and p999).
A p95 level is a threshold with which at least 95% of the response times fell below. In the example above, our p95 would be 250. Since we have only a handful of request samples, it would be the same threshold for all percentiles. If we were to compute a p75, it would be 240 because 3 out of 4 requests were responded within 240 milliseconds.
This article was heavily inspired by the Designing Data-Intensive Aplications book, by Martin Kleppmann..
No results found
End-to-end observability and real-time error tracking for AWS applications.