Even though NoSQL databases like Amazon’s own DynamoDB are very popular today, for many business use cases, there’s almost no way around using a traditional relational database.
Amazon Relational Database Service (RDS), released back in October 2009, is one of Amazon’s first cloud services and can therefore be seen as a very mature service. RDS’s database instances run on top of EC2, meaning its configuration and monitoring requirements are greater than fully managed services.
In this article, we’ll cover all the steps for creating proper monitoring for your RDS instances by starting with metrics and performance guidelines. We will also compare the monitoring options offered by AWS with Dashbird’s simple but nevertheless all-encompassing approach.
From Metrics to Runbooks
It’s always easy to jump right into taking action. But for most cases, including monitoring, it’s good to really think about what you want to achieve up front:
- What needs to be monitored?
- How frequently should it be monitored?
- What tools will you use?
- Which thresholds are critical?
- When and how should somebody be notified?
- What actions will be taken to handle such events?
Create a monitoring plan that answers these basic questions so that you don’t end up monitoring just for the sake of monitoring but rather to fulfill a specific and useful purpose.
What to Monitor
RDS publishes different types of metrics to CloudWatch, including:
- Metrics for your DB instances, e.g., CPU utilization, the number of database connections, available memory, network throughput, and read latency.
- Performance Insights metrics, e.g., the number of active sessions for the database engine.
- Real-time data about the operating system on which the DB instances are run — more on this later.
- Usage metrics for RDS service quotes in your AWS account, e.g., the total allocated storage of all your database instances or the number of instances itself.
Common Sources of Performance Issues
We now know that there’s a huge set of available metrics and performance indicators, which can be overwhelming at first. Fortunately, there are very common indicators of performance issues, which require a subset of metrics:
- High CPU and RAM load — high CPU usage or low available memory can but doesn’t have to be signed to take measures. For high throughput or concurrency, it can be expected and within norms.
- Very high or low traffic —- an eye should be kept on the correlation between traffic and DB throughput to determine acceptable traffic patterns.
- An increasing number of connections and a decrease in instance performance.
- Drastic changes to the IOPS metric — your baseline greatly depends on the disk and server specifications. If values are consistently different from expectations, you should start investigating.
It’s important to keep in mind that most of those metrics cannot be considered in isolation and require a baseline to compare against to have a clear indication of which values are acceptable and within the norm.
Notifications and Runbooks
There’s no immediate value in just collecting records of metrics and performance indicators. We need to either actively observe them or, better yet, get notified about detected outliers and anomalies. And even then, it’s having concrete plans and taking action that makes a monitoring solution valuable.
This means if CPU and memory values are breaching thresholds regularly but traffic patterns are within the expected ranges, measures should be taken to either scale out horizontally or vertically or investigate workloads to find and eliminate bottlenecks that are causing CPU and memory spikes.
AWS’s monitoring options for RDS
Let’s dive into all the options AWS provides us in more detail.
As described previously, there are three different types of monitoring: service monitoring via CloudWatch, database monitoring via Performance Insights (also submitted to CloudWatch), and OS monitoring with Enhanced Monitoring.
Service Monitoring via CloudWatch
Without any further configuration, RDS sends metrics to CloudWatch in one-minute intervals. Those metrics are stored for 15 days, enabling you to run analytics for historical data to gain service performance insights.
Together with SNS, you can define thresholds that will trigger notifications if breached, e.g., if a database’s memory consumption is above 80% for more than five minutes.
Monitoring Database Load with Performance Insights
RDS’s default metrics only help you to visualize and analyze the general load on the database, but it does not provide you with detailed insights about the cause of the load for certain types of workloads.
With Performance Insights, you’re able to filter loads in a very fine-grained manner, for example, by using SQL statements. This will help you to determine major contributors to heavy loads or bottlenecks affecting your service’s performance.
Performance Insights need to be enabled explicitly for your DB instance or Multi-AZ cluster. If you want to keep data collected by Performance Insights for longer than seven days, you’ll receive an additional charge.
Operation System Monitoring with Enhanced Monitoring
In addition to monitoring your database instances, you can also monitor the underlying operating system. The major difference between the default CloudWatch monitoring and Enhanced Monitoring lies in the collection of metrics: Enhanced Monitoring directly collects statistics via an agent running on the DB instance instead of the hypervisor that creates and runs the virtual machines.
Enhanced Monitoring collects a lot of additional metrics from the OS in real time. This is useful if you’re interested in the different processes or threads that are using the CPU.
An important fact here is that Enhanced Monitoring collects its metrics in CloudWatch logs. This means the data transfer and storage of CloudWatch Logs will increase, and you’ll receive an additional charge. Shorter monitoring intervals (meaning a higher frequency of monitoring) or a higher number of DB instances will increase pricing.
Simple and Complete Monitoring with Dashbird
Since its large update in February, Dashbird now also supports AWS RDS. With a simple one-click integration, Dashbird will collect metrics via AWS APIs to display performance metrics and insights about your RDS instances, clusters, and proxies without having any performance impact on your databases.
Insights per Cluster and Proxy
Dashbird visualizes cluster metrics like CPU and memory usage, network traffic, number of connections, reads, writes, used storage, and lag.
These single-click glances will help you easily find outliers and anomalies and do not require any additional effort. After Dashbird successfully connects to your AWS account and retrieves historical data, everything’s usable out of the box.
Well-Architected Insights and Tips
In addition to key performance metrics, Dashbird also supports you with industry best practices so that you don’t miss out on important configurations and commonly used patterns.
Simple Notifications via Slack, Email, or SNS
Natively setting up notifications for AWS via CloudWatch and SNS requires a lot of configuration and boilerplate code if you’re using Infrastructure as Code.
With Dashbird, notifications for single or multiple channels of your choice can be configured with only a few clicks.