Global Infrastructure

Dashbird continuously monitors and analyses your serverless applications to ensure reliability, cost and performance optimisation and alignment with the Well Architected Framework.

Product Features Start Free Trial

Overview

Two concepts are key in understanding how the AWS physical infrastructure is architected:

Region
Availability Zone (AZ)
Edge

AWS infrastructure planning implements an elaborate strategy to offer highly availabile, resilient¹ and scalable² services. AWS abstracts away most infrastructure management tasks from its users: from renting data center real estate to wiring up multiple machines in a local network.

Inspite of following a distributed model and many levels of replication (hardware, data, software, network), different parts of this infrastructure fail occasionally and it’s difficult to predict which ones, when and how they will fail.

When these systems do fail, having different Regions and AZs enables AWS to continue providing services to its customers with minimal to zero disruption. This model isn’t completely fail-safe. Some failures might still be disruptive, but it’s rare.

Availability Zone (AZ)

A collection of data centers representing a partition of the AWS infrastructure and services. Each data center is hosted in a separate facility and may have hundreds of thousands of machines.

AZs are interconnected within each Region with maximum throughput and low-latency communications. AWS uses a fully redundant network with dedicated metro fiber³. By replicating application resources across different AZs, AWS provides redundancy against from natural events and disasters (lightning strikes, tornadoes, flooding, etc).

Region

An AWS Region corresponds to a geographical area⁴ that contains multiple AZs (typically 3). AWS offers more than 20 geographical Regions across the globe⁵.

Replication Options

Cross-Region Replication

Although a single Region offers a great level of redundancy with multiple AZs, some risks still apply. Political instability, social unrest or military conflicts are some ot the factors that may strike down an entire Region.

To ensure maximum availability and resilience, though, applications can benefit from cross-region replication. In this case, if an entire Region goes offline, the application can continue to serve its users from another Region of the planet. Latency might increase slightly to users that were previously served by the unavailable Region, but services won’t be disrupted.

Some services will provide an easy way to implement Cross-Region, such as DynamoDB Global Tables and S3 Replication, while others will require developers to implement their own logic.

Multi-AZ Replication

Managed services usually will provide multi-AZ replication by default. This is the case of all serverless systems, such as Lambda, DynamoDB, and S3.

Not all AWS services will provide multi-AZ redundancy automatically, though. It is possible to enabled the feature relatively easily. This is the case of Relational Database Service (RDS) instances⁶ and File Systems⁷, for example. There are tutorials for other services, such as Elastic Compute Cloud (EC2)⁸.

Controlling Multi-AZ

For compute workloads running on EC2⁹, AWS offers partition placement groups¹⁰. It allows developers to control services that must be running on a single AZ, as well as distribute services inside a single Data Center.

Cluster placement groups¹¹ will keep multiple EC2 instances clustered together to reduce network latency, typically required by High-Performance Computational (HPC) workloads. Services such as Kafka, Hadoop and HBase may benefit from this feature.

Spread placement groups¹² allows to distribute critical instances on different server racks, reducing the exposure to correlated failures.

Footnotes:

Refer to the Reliability page. ↩︎
Refer to the Sclability page. ↩︎
AWS Availability Zones ↩︎
Not necessarily following any political borders, but more aligned with business and commercial practices (e.g. “Asia Pacific“, “Middle East“). ↩︎
List of AWS Regions and AZs ↩︎
AWS RDS Multi-AZ Deployments ↩︎
Deploying Multi-AZ File Systems ↩︎
Increase the Availability of Your Application on Amazon EC2 ↩︎
EC2: Elastic Compute Cloud ↩︎
Using partition placement groups for large distributed and replicated workloads in Amazon EC2 ↩︎
EC2 Cluster Placement Groups ↩︎
EC2 Spread Placement Groups ↩︎

No results found