Monitoring platform for keeping systems up and running at all times.
Full stack visibility across the entire stack.
Detect and resolve any incident in record time.
Conform to industry best practices.
This is a hands-on course on how to deploy a fully Serverless web app using the AWS CDK. You will learn how to:
You can use the resources declared in this demo application as a starting point to mix and adapt to your own architectures later, which should save you quite some time.
The demo app is a public blog where anyone can read, publish, and like posts. It’s available on this link. Go ahead and publish something in the top-left corner (yellow button) and also “like” articles already published. Check out the codebase on this repo.
CDK stands for Cloud Development Kit. Think of it as CloudFormation (CF) in your preferred language (Python, Typescript, C#, etc). Roughly speaking, it works like this:
from aws_cdk import aws_s3 my_bucket = aws_s3.Bucket(self, 'MyBucket')
cdk deploy
In case you would like to dig deeper, AWS also has a workshop that will get your basics started. I also strongly recommend reading the official CDK documentation.
Although released as a stable project by the AWS team, many parts (a lot of the good ones) are still experimental and APIs may change in backward-incompatible ways.
It’s under constant development. During the preparation of this course, I had to upgrade my libraries three times.
Documentation is still lacking at some parts and you will need to look at the CDK code, occasionally, to understand how to declare certain things.
Although we have provided an online demo, you can also deploy this app in your own AWS account:
git clone git@github.com:byrro/serverless-website-demo.git sls-demo; cd sls-demo
virtualenv -p /urs/bin/python3.8 .env; source .env/bin/activate; pip install -r requirements
export AWS_ACCOUNT_ID=1234567890
cdk deploy sls-blog; cdk deploy sls-blog-api; cdk deploy sls-blog-analytical
** You can also hard code your Account ID in the CDK project, as I’ll show in a minute;
When starting a new project from scratch, you would run cdk init --language [python|typescript|...]. This is not necessary for this demo, since the project is already created.
cdk init --language [python|typescript|...]
Deploying this architecture in the cloud and blindly believing it will work flawlessly is not reasonable. We want to be the first one to know when something is not right to act upon it as quickly as possible.
In this project, I used Dashbird for its ease of use and seamless integration. Instead of having to deploy an agent inside my code, Dashbird plugs into my Stacks through a CloudFormation template that I can deploy with the effort of one click. It not only monitors Lambda function errors, but also other resources that we’re using, such as DynamoDB tables. They even suggest insights for architectural improvements cross-referenced against industry best practices.
Finally, Dashbird offers a free-forever plan. It’s a no-brainer to try it out by registering for free.
A CDK project creates an “Application”. This app may have one or more “Stacks”. A Stack is a group of cloud resources (Lambda functions, S3 buckets, etc) that are instantiated using CDK classes. It’s also possible to have multiple applications in a single CDK project.
Creating a CDK app is as simple as:
app = core.App()
When you run cdk init --language [language], an initial application with basic boilerplate code is created for you in the project root, under app.py.
cdk init --language [language]
app.py
The next thing we need is an environment, which is composed of an AWS Account ID and Region:
env = env = core.Environment( account=1234567890, region='us-east-1', )
Declaring an environment is not required (CDK can infer from your AWS credentials), but is a good practice. Most of us work with multiple AWS accounts. It’s easy to mess around with several projects, accounts, credentials. When we explicitly set the environment in the CDK app, it’s locked and prevents mistaken deployments.
Now we declare our stacks:
from my_project.my_project_stack import MyStack my_stack = MyStack( app, 'my-stack', env=env, )
This is how we instantiate our stacks for deployment. In the next section, we’ll see how to declare those stacks.
The Stack object is where we declare our AWS resources. It inherits from the core.StackCDK class and accepts a scope – which is our app object – a string identifier and an environment.
core.Stack
app
class MyStack(core.Stack): def __init__( self, scope: core.Construct, id: str, env: core.Environment, **kwargs, ) -> None: super().__init__(scope, id, **kwargs) # Declare AWS resources here
To declare AWS resources, we need a specific library for each service. Here’s a list of all Python libraries and their Typescript counterparts. Other flavors are Java and .NET.
Let’s see how a basic REST API would be declared (typing expressions were removed for readability purposes):
from aws_cdk import aws_apigateway, aws_lambda class MyStack(core.Stack): def __init__(self, scope, id, env): super().__init__(scope, id, **kwargs) my_lambda = aws_lambda.Function( self, 'MyLambda', runtime=aws_lambda.Runtime.PYTHON_3_8, code=aws_lambda.Code.asset('my_lambda_folder), handler='my_lambda.handler', ) aws_apigateway.LambdaRestApi( self, 'sls-blog-rest-api-gateway', handler=my_lambda, )
We first declare a Lambda function my_lambda. We point its code to the my_lambda_folder. Inside this folder, there should be a my_lambda.py file, containing a function called handler. This handler function should accept Lambda invocations normally (an event and contextobjects).
my_lambda
my_lambda_folder
my_lambda.py
handler
event
context
Next, a LambdaRestApi is declared, using my_lambda as the handler (not to confuse with the Lambda’s handler function). This will create a new API Gateway REST API integrated with my_lambda using an AWS_PROXY integration type. All HTTP requests will be routed to the Lambda function.
LambdaRestApi
AWS_PROXY
This project comprises one application with three Stacks. They’re all declared in the app.pyfile and sls_website_stack.py file.
sls_website_stack.py
Below we’ll walk through all Stacks in a high level. I encourage you to inspect the stacks file to learn how these resources are declared and also integrated. For example, a Kinesis Firehose is created in one Stack and referenced in another to include its name as an environment variable for the Lambda function that will interact with it.
Except for the frontend static Stack – which is small – you will notice that resources are initialized with a None (null) value in the beginning. The reason is that, even though the CDK is generally more succinct than CloudFormation, it can still be long enough to clutter the view of the entire Stack. Having each resource declared first in one line, I can provide a short summary of everything that’s in the Stack and then instantiate the CDK classes in other methods.
None
class SlsBlogApiStack(core.Stack): def __init__(self, scope, id, env, static_stack): super().__init__(scope, id, **kwargs) self.static_stack = static_stack # SQS Queues self.queue_ddb_streams_dlq = None # Dead-letter-queue for DDB streams # DynamoDB Tables self.ddb_table_blog = None # Single-table for all blog content # DynamoDB Event Sources self.ddb_source_blog = None # Blog table streams source # DynamoDB Indexes self.ddb_gsi_latest = None # GSI ordering articles by timestamp # Lambda Functions self.lambda_blog = None # Serves requests to the blog public API self.lambda_stream_reader = None # Processes DynamoDB streams # Continues with other resources...
Notice it takes another Stack object (static_stack) as an argument to its initialization. In the app.py file, you can see that the SlsBlogApiStack is initialized passing the SlsBlogStackas an argument.
static_stack
SlsBlogApiStack
SlsBlogStack
We use it to reference the CloudFront distribution domain (d1qmte5oc6ndq5.cloudfront.net) in the Lambda environment variables. This variable can be used to customize the HTTP response header Access-Control-Allow-Origin to comply with CORS standards. This illustrates one way to easily integrate and reference information from one Stack into another within a CDK project.
Access-Control-Allow-Origin
At the end of the initialization, another method is called to instantiate the CDK classes for each resource and configure their parameters.
self.create_cdk_resources()
Next, we’ll walk through each of our project’s Stacks.
Our focus is on the Serverless backend, so the frontend here is terribly rough and simple. It’s stored in an S3 Bucket and distributed through a CloudFront CDN.
CDK has a helpful class called BucketDeployment. It takes the contents of a directory and sync to an S3 bucket. In this case, we stored the frontend code in the website_staticfolder.
website_static
aws_s3_deployment.BucketDeployment( self, 'SlsBlogStaticS3Deployment', sources=[aws_s3_deployment.Source.asset('website_static')], destination_bucket=static_bucket, distribution=cdn, )
Our backend consists of an API Gateway (REST) connected to a monolithic Lambda function. Microservices receives a lot of press, but you probably shouldn’t always break your applications in several functions. A Monolith is just fine – and sometimes recommended -, really.
This API & Lambda support a single endpoint (with GET and POST methods) with a queryString “action”, which takes three parameters:
get-latest-articles
like-article
publish-article
Here’s the power of the CDK model. We can create a REST API with 10 lines of code:
rest_api_blog = aws_apigateway.LambdaRestApi( self, 'sls-blog-rest-api-gateway', handler=lambda_blog, # Previously declared Lambda function deploy_options=aws_apigateway.StageOptions( stage_name='api', throttling_rate_limit=lambda_param_max_concurrency, logging_level=aws_apigateway.MethodLoggingLevel('INFO'), ), )
One nice little thing is that Lambda memory is used as a cache for the latest articles. We load the cache container outside the Lambda handler function. It remains in memory even after an invocation ends and is available for subsequent requests. Learn more here about how to use Lambda as a cache mechanism.
MAX_CACHE_AGE: int = 120 # In seconds CACHE_LATEST_ARTICLES: Dict[str, Union[int, list]] = { 'last_update': time.time(), 'articles': [], }
All the data is stored in DynamoDB (DDB) using a single-table design, in on-demand mode. The site only shows the latest blog articles and items get auto-deleted by DDB after a few days by setting a time-to-live attribute.
time-to-live
ddb_attr_time_to_live = 'time-to-live' ddb_param_max_parallel_streams = 5 ddb_table_blog = aws_dynamodb.Table( self, 'sls-blog-dynamo-table', partition_key=aws_dynamodb.Attribute( name='id', type=aws_dynamodb.AttributeType.STRING, ), billing_mode=aws_dynamodb.BillingMode.PAY_PER_REQUEST, point_in_time_recovery=True, removal_policy=core.RemovalPolicy.DESTROY, time_to_live_attribute=self.ddb_attr_time_to_live, stream=aws_dynamodb.StreamViewType.NEW_AND_OLD_IMAGES, )
The DDB table also has a GSI (Global Secondary Index) that makes it easier to retrieve articles ordered by date for the site:
self.ddb_table_blog.add_global_secondary_index( index_name='latest-blogs', partition_key=aws_dynamodb.Attribute( name='item-type', type=aws_dynamodb.AttributeType.STRING, ), sort_key=aws_dynamodb.Attribute( name='publish-timestamp', type=aws_dynamodb.AttributeType.NUMBER, ), projection_type=aws_dynamodb.ProjectionType.ALL, )
Modifications to DDB items generate streams that are processed by a second Lambda function. These streams are then repackaged and sent to a Kinesis Firehose stream processor.
DDB doesn’t provide the flexibility that SQL databases offer and many choose Aurora Serverless, for example. Although Aurora is a great service, personally I prefer DDB for its simplicity and reliable, consistent performance. But sometimes we do need to run analytical queries, those with aggregations and on-the-fly filters. For that, we’ll be using Athena (more in the next Stack).
A Kinesis Firehose Stream is responsible for batching data inserted/modified in DDB, converting them to Apache Parquet format and storing in dedicated S3 buckets. From S3, we create a Data Lake with AWS Glue (used to declare our data schemas) and Athena (used to query the data).
Athena is extremely powerful. We can use SQL SELECT statements (with some limitations) to query terabytes of data and pay on-demand ($0.005 per GB of data scanned). Using Parquet not only improves query speed, but also reduces cost by minimizing the amount of data Athena needs to scan for each query.
Queries that are impossible or expensive/slow in DynamoDB, such as aggregations and JOINs, are fast and cheap in Athena. The two services combine each other in a perfect way so that your application has optimized transactional storage and flexible analytical querying capabilities.
We can use Athena to query all articles ever published and cross-reference with likes and HTTP metadata (source IP address, country, device type, etc). Even articles that were already expired by DynamoDB TTL (time-to-live) would continue to be available in the Data Lake.
For example, which countries are liking the most articles? In the AWS Console, we get something like this:
Queries can also be executed programmatically with Athena API or AWS SDKs (e.g. Python’s boto3) to integrate anywhere we need this data.
Athena also supports JOINs. Here’s an example of joining articles and HTTP metadata to analyze the most popular authors among readers of a particular country:
CDK can deploy one Stack at a time. Since we have three, it’s necessary to specify which one when running the cdk deploy command. We do that by passing in the Stack ID as a CLI argument. For example, the following command will deploy the SlsBlogApiStack (id: sls-blog-api):
sls-blog-api
cdk deploy sls-blog-api
Since all Stacks involve some type of permission granting, CDK asks for confirmation before deploying those resources. You can review the permissions requested and hit ywhen it’s good to go.
y
We’ve covered how to structure CDK apps and add a bunch of AWS Resources to deploy with a simple cdk deploy command. If you’re new to the CDK – and as suggested early in the article -, it’s strongly recommended to follow AWS CDK workshop and documentation.
Keep an eye on future publications as well, as Dashbird is releasing other examples and tutorials to reap the most out of AWS serverless services with the power of infra automation with CDK or else.
In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.
Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.
In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.
Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.
Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.
Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.
Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.
I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.
Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.
Great UI. Easy to navigate through CloudWatch logs. Simple setup.
Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.