All-in-one serverless DevOps platform.
Full-stack visibility across the entire stack.
Detect and resolve incidents in record time.
Conform to industry best practices.
In this article, we’ll be rewinding back to the very beginning of the AWS Well-Architected Framework to understand how and why it came to be, and why is it of utmost importance, but very often underrated, for serverless developers to learn, understand and apply this framework of best-practices. We’ll also be looking into how the framework has evolved and how it should be used in 2021.
For a more in-depth deep dive into each of the five pillars, you can download this free e-book, which dissects the pillars and explains the importance and how to implement these best practices in real life in an easily understandable manner.
In 2012, the AWS Well-Architected Framework came about in response to the build out of its portfolio. The industry and user feedback consistently showed that while there was plenty of documentation on the services, there wasn’t enough of best practice. As ever, AWS listened to its users and published its first framework in 2015.
In 2016, the Operational Excellence pillar was introduced, again as a result of user feedback. The original Well-Architected Framework was rightly technically heavy, but more was needed and wanted for operational posture improvements, understanding how to reduce heavy lifting, and improve the day-to-day running of AWS infrastructure.
By the next year, there was now a massive proliferation of services and AWS wanted to cater for the segregation of environments seeing that the AWS Service Reviews were very different between classic architecture and serverless. So, in 2017, lenses were introduced which would overlay the original framework enabling speaking to specific workloads. It meant that the Reviews and adoption of AWS could be much more refined. We’ll dig deeper into the lenses later on in this article.
Today, in 2020, there have been some big framework updates including updates to all pillars. AWS has also launched a Serverless Lens, which can be found in the Well-Architected tool within the console.
AWS Well-Architected and Serverless Ebook
Often, customers start in experimentation and workloads tend to develop organically with increasing additions. It’s most common that in this growth process, a deviation of best practice can happen to make it harder to layer that complexity on the top. For this reason, we and AWS want to teach customers to align what they already have with best practices to ensure faster deployment and a better security posture.
Risks span all pillars of the Framework (Reliability, Performance, Operational Excellence, Cost, and Security), and the Review process works to lower or mitigate risks over a period of time. Many customers find that they aren’t sure what their risk profile is, which becomes a big worry for C-level profiles who start asking “where do we sit in our risk profile?” and “how do we work to reduce that over a 6-12 month period?”
With education and knowledge comes more power and informed decisions. Let’s say a customer who has a 2-3 year old workload needs to be redeployed into a new environment, which has segregated accounts. Their options are; improve the existing environment or migrate the workload to a new one. The Well-Architected Review will be able to show the work needed for the remediations, including the upfront changes, ongoing maintenance, and costs.
We have found that the happiest customers are those that feel well-educated from an AWS platform and service point of view. This includes instilling best practices, knowledge of new releases, the release cycles, and how services evolve. From here, they are more willing to share and build greater trust with AWS as a service.
The pillars are the cornerstone of any AWS architecture, where the vertical segregation can be applied to any workload whether that is Serverless, EC2 or Big Data.
The design principles represent the goals we are aiming to achieve in each pillar. Looking at the objectives of the workloads is key here, and when running a Review, there will be many questions, architectural diagrams, and a gathering of information before the review itself. The quest here is to have a data-driven, informative review.
The intent of a Review is to provide insight into best practices for AWS. It’s important to remember that it is not aligned with an audit or any regulatory body and that the data isn’t shared. It’s there to simply improve posture against all Well-Architected Framework pillars.
The Reviews provide pragmatic, proven advice that AWS knows work and that which is tailored to the customer’s need.
For example, if the customer has a tight security posture requirement, we will bias the review towards that.
When it comes to an AWS Review, it’s important to keep in mind that these aren’t intended as a one-time check. These reviews and best practice sessions should be run with regular cadence; twice a year is often sufficient to avoid any glaring holes, and for any holes that are found, we want to find them early. Its simple, regular cadence provides greater efficiency.
The Well-Architected Lenses were created to be specific to workload type. While the same review over diverse workloads was positive, AWS wanted to allow more specificity and so over the coming years, more lenses will emerge into the Well-Architected tool itself.
The design principles are specific to each lens, and the lens documentation includes popular scenarios; for instance, the Serverless Lens includes restful APIs and mobile devices. There is also the High-Performance Review and a lens for IoT.
At its core, the lenses are there to enable maximum effect to work towards a customer’s business outcome.
It’s important that functions remain concise and single function in their nature. Customers are already moving away from the monolith design, however what’s showing is Lambda code running in a tentacle manner. We don’t want a Swiss army knife Lambda style!
Making full use of the concurrency model is a trade-off made at the start. Remember that you don’t need to look at the total number of requests.
Functions, by their nature, are short-lived and so, the underlying infrastructure isn’t guaranteed. Instead, persistent storage with a decoupled nature is preferred for durable requirements.
By using technology and hardware in an agnostic method means that code will work over a breadth of time.
This is undoubtedly one of the key benefits of Serverless. Chaining functions together is akin to our standard monolith designs, so please be mindful and don’t fall into that trap. AWS has state machine structures to build out the complex orchestration needed, so make full use of this.
With this in mind, combining functions that are precise means that you can build out those complex workflows much more easily.
A biggie in Serverless. Make use of this principle to ensure events and responses align with business functionality.
Another major component in Serverless. Ensure that appropriate retries for downstream calls are included within your code.
Some of the common issues are:
With such a large surface area and so much data moving around, observing your infrastructure is one of the hardest elements to keep up with. To make this easier, data should be centralized and made as accessible as possible.
It’s incredibly important to not see logs, metrics, and traces in only silos, but instead to look across the spectrum of your managed services. Asking, how does your SQS queue interact with your Lambda functions?
Tooling also helps in reducing time to discovery and resolution. Good tooling tells you when something is wrong, what has gone wrong, and the best way to fix it. Through this, it naturally encourages best practices too making it the best way to automate all of the above.
Tooling for Serverless, particularly monitoring, security, alert and failure detection should come down to automation and abstraction.
Looking at Cognito and SQS queues, for a team to use SQS, they need to first implement them, and then understand the risks and monitor them. Once you start adding new queues and functions, that sort of alert coverage and monitoring for unknown failures must always be extended to the rest of the infrastructure.
It’s important, therefore, that tooling constantly adjusts itself to the ever-changing infrastructure.
A bit of a no-brainer but so important to highlight is that tooling helps to manage underlying the infrastructure. The log pipeline and log ingesting can be managed, as can an alarm or alerting system. This really is the Serverless way!
As touched on before, it also enables learning. A good tool, such as Dashbird, makes it understandable and clear as to how the system has worked historically, and how the changes have affected the system to perform over time.
Dashbird connects to your AWS account with read-only-permission and ingests all data across all managed services including logs, metrics, and traces. If anything happens in your infrastructure, there is just one place to see all the information, and you’re able to view on an account level, detailed transaction, or execution level.
This one place means you can avoid going around looking for the relevant data and we see this as increased effectiveness to enable companies to build faster.
Dashbird will see everything in your logs that indicates failure or look at a metric’s data point across any managed services showing higher delay or failure, which you will be automatically alerted of. The app will also tell you what you should be focused on so you don’t have to manage this.
By looking at all data points of the system, we realized we can actually do a lot more abstraction and automation using the data we have already. It made sense then that Dashbird performed checks on how to improve architecture to follow best practices.
Psst, stay tuned, we have a big announcement coming out on Thursday 25 February. Intrigued? Sign up for the announcement and we’ll send it straight to your inbox (no spam, we promise).
You should aim to have one managed, automatically scalable central place for all the data you have and need for your infrastructure and to have it as readily available and accessible as possible.
Dashbird is designed to give back development time, confidence, and understanding of the system, allowing you to search, query, visualize logs metrics, and traces. You can also go into the transaction level to see other services a function has interacted with, and see relevant cold starts, memory usage, and retries. Seeing this shows how retries are linked in time and if they were successful. Dashbird also offers this for a wider range of AWS services, not just functions.
When you onboard, Dashbirs immediately starts discovering and ingesting data from your resources. For example, let’s say you have 1000 Lambda functions; once connected to Dashbird, we will tell you the code exceptions, timeouts, configuration issues on all of them and act as an incident management platform. We also look at invocations and any underlying event to cause this, and the same goes for metrics too.
There is a broad variety of failure checks within Dashbird, including high delays in SQS or consuming items at a slower pace than they’re entering the queue. These sorts of issues will be found in real-time, and again, work within other services looking at ECS clusters or Kinesis throttling for example.
Currently, we have over 70 community curated Well-Architected checks, across a different variety of elements; risk of running into memory outage, over-provisioned resources, encrypting database requirements, or whether detailed monitoring is enabled or if logging is enabled.
We’ve ensured that we cover all five of the Well-Architected pillars and provide an actionable list of items showing you what incident affects your system in which pillar and to what extent.
Dashbird increases reliability and iteration speed. Developers have said they’re able to up to 80% work faster with the aid of Dashbird saying something is wrong and telling them the ins and outs of their system. It also helps in the development workflow.
At Dashbird, we understand that the core idea and value of serverless is to focus on the customer and the ability to avoid undifferentiated heavy lifting. That’s what we provide. We give the focus back to developers to only think about the end-customer and to not be distracted by debugging and alarm management or to worry if something is working or not.
We also help users to achieve industry best practice, with an effect on cost optimization, performance-optimized and the overall management of the posture of your infrastructure.
This article was put together based on a Dashbird webinar with Tim Robinson, Well-Architected Framework’s Geo Lead at AWS, and Taavi Rehemägi, CEO at Dashbird.
Today we are excited to announce scheduled searches – a new feature on Dashbird that allows you to track any log event across your stack, turn it into time-series metric and also configure alert notifications based on it.
One of the most vital aspects to monitor is the metrics. You should know how your cluster performs and if it can keep up with the traffic. Learn more about monitoring Amazon OpenSearch Service.
Dashbird recently added support for ELB, so now you can keep track of your load balancers in one central place. It comes with all the information you expect from AWS monitoring services and more!
Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.
Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.
Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.
Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.
I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.
Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.
Great UI. Easy to navigate through CloudWatch logs. Simple setup.
Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.
End-to-end observability and real-time error tracking for AWS applications.