How to Monitor Your AWS RDS Instances

Tobias Schmidt

June 24th, 2022

Even though NoSQL databases like Amazon’s own DynamoDB are very popular today, for many business use cases, there’s almost no way around using a traditional relational database.

Amazon Relational Database Service (RDS), released back in October 2009, is one of Amazon’s first cloud services and can therefore be seen as a very mature service. RDS’s database instances run on top of EC2, meaning its configuration and monitoring requirements are greater than fully managed services.

In this article, we’ll cover all the steps for creating proper monitoring for your RDS instances by starting with metrics and performance guidelines. We will also compare the monitoring options offered by AWS with Dashbird’s simple but nevertheless all-encompassing approach.

From Metrics to Runbooks

It’s always easy to jump right into taking action. But for most cases, including monitoring, it’s good to really think about what you want to achieve up front:

What needs to be monitored?
How frequently should it be monitored?
What tools will you use?
Which thresholds are critical?
When and how should somebody be notified?
What actions will be taken to handle such events?

Create a monitoring plan that answers these basic questions so that you don’t end up monitoring just for the sake of monitoring but rather to fulfill a specific and useful purpose.

What to Monitor

RDS publishes different types of metrics to CloudWatch, including:

Metrics for your DB instances, e.g., CPU utilization, the number of database connections, available memory, network throughput, and read latency.
Performance Insights metrics, e.g., the number of active sessions for the database engine.
Real-time data about the operating system on which the DB instances are run — more on this later.
Usage metrics for RDS service quotes in your AWS account, e.g., the total allocated storage of all your database instances or the number of instances itself.

Common Sources of Performance Issues

We now know that there’s a huge set of available metrics and performance indicators, which can be overwhelming at first. Fortunately, there are very common indicators of performance issues, which require a subset of metrics:

High CPU and RAM load — high CPU usage or low available memory can but doesn’t have to be signed to take measures. For high throughput or concurrency, it can be expected and within norms.
Very high or low traffic —- an eye should be kept on the correlation between traffic and DB throughput to determine acceptable traffic patterns.
An increasing number of connections and a decrease in instance performance.
Drastic changes to the IOPS metric — your baseline greatly depends on the disk and server specifications. If values are consistently different from expectations, you should start investigating.

It’s important to keep in mind that most of those metrics cannot be considered in isolation and require a baseline to compare against to have a clear indication of which values are acceptable and within the norm.

Notifications and Runbooks

There’s no immediate value in just collecting records of metrics and performance indicators. We need to either actively observe them or, better yet, get notified about detected outliers and anomalies. And even then, it’s having concrete plans and taking action that makes a monitoring solution valuable.

This means if CPU and memory values are breaching thresholds regularly but traffic patterns are within the expected ranges, measures should be taken to either scale out horizontally or vertically or investigate workloads to find and eliminate bottlenecks that are causing CPU and memory spikes.

AWS’s monitoring options for RDS

Let’s dive into all the options AWS provides us in more detail.

As described previously, there are three different types of monitoring: service monitoring via CloudWatch, database monitoring via Performance Insights (also submitted to CloudWatch), and OS monitoring with Enhanced Monitoring.

Service Monitoring via CloudWatch

Without any further configuration, RDS sends metrics to CloudWatch in one-minute intervals. Those metrics are stored for 15 days, enabling you to run analytics for historical data to gain service performance insights.

Together with SNS, you can define thresholds that will trigger notifications if breached, e.g., if a database’s memory consumption is above 80% for more than five minutes.

Monitoring Database Load with Performance Insights

RDS’s default metrics only help you to visualize and analyze the general load on the database, but it does not provide you with detailed insights about the cause of the load for certain types of workloads.

With Performance Insights, you’re able to filter loads in a very fine-grained manner, for example, by using SQL statements. This will help you to determine major contributors to heavy loads or bottlenecks affecting your service’s performance.

Performance Insights need to be enabled explicitly for your DB instance or Multi-AZ cluster. If you want to keep data collected by Performance Insights for longer than seven days, you’ll receive an additional charge.

Operation System Monitoring with Enhanced Monitoring

In addition to monitoring your database instances, you can also monitor the underlying operating system. The major difference between the default CloudWatch monitoring and Enhanced Monitoring lies in the collection of metrics: Enhanced Monitoring directly collects statistics via an agent running on the DB instance instead of the hypervisor that creates and runs the virtual machines.

Enhanced Monitoring collects a lot of additional metrics from the OS in real time. This is useful if you’re interested in the different processes or threads that are using the CPU.

An important fact here is that Enhanced Monitoring collects its metrics in CloudWatch logs. This means the data transfer and storage of CloudWatch Logs will increase, and you’ll receive an additional charge. Shorter monitoring intervals (meaning a higher frequency of monitoring) or a higher number of DB instances will increase pricing.

Simple and Complete Monitoring with Dashbird

Since its large update in February, Dashbird now also supports AWS RDS. With a simple one-click integration, Dashbird will collect metrics via AWS APIs to display performance metrics and insights about your RDS instances, clusters, and proxies without having any performance impact on your databases.

Insights per Cluster and Proxy

Dashbird visualizes cluster metrics like CPU and memory usage, network traffic, number of connections, reads, writes, used storage, and lag.

These single-click glances will help you easily find outliers and anomalies and do not require any additional effort. After Dashbird successfully connects to your AWS account and retrieves historical data, everything’s usable out of the box.

Well-Architected Insights and Tips

In addition to key performance metrics, Dashbird also supports you with industry best practices so that you don’t miss out on important configurations and commonly used patterns.

Simple Notifications via Slack, Email, or SNS

Natively setting up notifications for AWS via CloudWatch and SNS requires a lot of configuration and boilerplate code if you’re using Infrastructure as Code.

With Dashbird, notifications for single or multiple channels of your choice can be configured with only a few clicks.

Read our blog

Making serverless applications reliable and bug-free

In this guide, we’ll talk about common problems developers face with serverless applications on AWS and share some practical strategies to help you monitor and manage your applications more effectively.

ANNOUNCEMENT: new pricing and the end of free tier

Today we are announcing a new, updated pricing model and the end of free tier for Dashbird.

4 Tips for AWS Lambda Performance Optimization

In this article, we’re covering 4 tips for AWS Lambda optimization for production. Covering error handling, memory provisioning, monitoring, performance, and more.

Made by developers for developers

Dashbird was born out of our own need for an enhanced serverless debugging and monitoring tool, and we take pride in being developers.

Get started free or learn more

What our customers say

Dashbird gives us a simple and easy to use tool to have peace of mind and know that all of our Serverless functions are running correctly. We are instantly aware now if there’s a problem. We love the fact that we have enough information in the Slack notification itself to take appropriate action immediately and know exactly where the issue occurred.

Thanks to Dashbird the time to discover the occurrence of an issue reduced from 2-4 hours to a matter of seconds or minutes. It also means that hundreds of dollars are saved every month.

Great onboarding: it takes just a couple of minutes to connect an AWS account to an organization in Dashbird. The UI is clean and gives a good overview of what is happening with the Lambdas and API Gateways in the account.

I mean, it is just extremely time-saving. It’s so efficient! I don’t think it’s an exaggeration or dramatic to say that Dashbird has been a lifesaver for us.

Dashbird provides an easier interface to monitor and debug problems with our Lambdas. Relevant logs are simple to find and view. Dashbird’s support has been good, and they take product suggestions with grace.

Great UI. Easy to navigate through CloudWatch logs. Simple setup.

Dashbird helped us refine the size of our Lambdas, resulting in significantly reduced costs. We have Dashbird alert us in seconds via email when any of our functions behaves abnormally. Their app immediately makes the cause and severity of errors obvious.