We are thrilled to announce our support for AWS RDS instances, expanding our DBaaS optimization capabilities to AWS (we announced our support for Azure SQL earlier this year).
Enterprises are deploying more and more workloads in the cloud and moving existing workloads, as they implement digital transformation. The platform-as-a-service (PaaS) market is expected to grow from about $25 billion in 2018 to about $92 billion in 2023 at a 2018–2023 compound annual growth rate (CAGR) of 29.5%. Data services make up over half of the platform market and are expected to grow even faster at a CAGR of 33% to about $53 billion in 2023. (Source: IDC White Paper). Amazon Relational Database Service (RDS) is one of such data services.
AWS RDS was launched in October 2009 and today is one of Amazon’s most adopted PaaS services, giving customers the ability to deploy full database servers in the cloud. Customers using RDS are enjoying simplified database administration and maintenance as well as high availability, scalability, and agility in the cloud.
Amazon RDS supports the following database engines:
- Amazon Aurora
- Microsoft SQL Server
Amazon RDS instances can cost between $12 to $17,000 per month, making it extremely important to continuously choose the right configuration for your RDS instances to assure their performance while minimizing cloud cost. However, Database Administrators (DBAs) struggle to make the right resourcing decisions for their RDS instances because there are just too many variables to constantly consider, which in most cases means resources are overprovisioned and cloud cost overruns are the result.
To solve this challenge, Turbonomic now delivers AWS RDS optimization as part of its continuous cloud optimization feature suite. Turbonomic provides RDS scaling actions that are performance-aware, data-driven, trustworthy, and safe. Moreover, customers can automate these actions for continuous real-time optimization to assure the performance of their RDS instances and the applications using it, while minimizing RDS cost.
Smarter, multidimensional AWS RDS scaling
There are two aspects to RDS scaling - compute and storage scaling. Turbonomic recommends actions on RDS instances which combine the Compute and Storage aspects.
The important metrics Turbonomic looks at to scale RDS instances are VCPU, VMem, DB Cache Hit Rate, Storage Amount and IOPS. It continuously monitors these metrics and suggests scale up or scale down actions to bring these metrics to the desired state.
Depending on the need of each RDS instance, Turbonomic can recommend:
- A change in the compute tier
- A change in the storage tier
- A change in the Storage Amount
- A change in the Provisioned IOPS (for io1 storage type)
- A combination of these actions
Turbonomic’s Action Center includes a high-level overview of all RDS scaling actions, including information about action disruptiveness and reversibility, making it easier for users to understand if a service downtime is required when executing RDS scaling action:
Scaling up to assure performance
Below is an example of a scale up action due to a VCPU congestion. In this case, Turbonomic identifies a performance risk for VCPU resources and generates a scale up action to add VCPU resources to assure RDS performance and remove risk:
Here is another example of a scale up action - this time because of an IOPS congestion. From the screenshot below, we can see that IOPS is overutilized. In this case, Turbonomic generates an action to increase storage amount to get more IOPS capacity for a RDS instance running on a gp2 storage tier.
Here is additional example for a scale up action to resolve IOPS congestion performance risk, but this time Turbonomic also addresses underutilized VCPU by reducing VCPU compute resources and savings RDS cost. This action shows how Turbonomic evaluates all possible options and suggests the best action that will assure performance while minimizing cost.
Considering DB Cache Hit Rate when Scaling VMEM in RDS
An AWS RDS instance would usually use all the memory given to it because of the nature of such a workload. So, to make a VMEM scaling decision purely based on VMEM utilization would be incorrect. Enter cache hit rate. Known as DB Cache Hit Rate in Turbonomic, this metric represents the number of requests to the RDS instance/cluster whose responses are served by cache over the total number of requests made. Turbonomic discovers and calculates the cache hit rate of all Performance Insights-enabled RDS instances. This allows the Turbonomic system to make well-informed instance type sizing decisions that consider the relationship between cache hit rate and memory utilization.
Below is an example where we scale down compute tier because VMEM is underutilized and the CacheHitRate is at 100% (95% of the time).
Scaling down for savings
In most customer environments, overprovisioned RDS instances would be the norm. Usually, this is because DBAs struggle to make proper sizing decisions for their RDS instances and prefer to add unnecessary compute and storage resources to assure RDS performance at a cost of increased cloud bill. Turbonomic identifies such cases where RDS instances are not using all the resources allocated to them and generates scaling decisions to reduce compute and storage resources, while assuring that RDS performance will not be harmed and RDS cost will be minimized.
Here is an example of a RDS instance which is not using its allocated IOPS. The instance is currently on io1 storage:
There are few options to reduce IOPS resources and hence the cost:
- Reduce the Provisioned IOPS and stay on io1
- Move to a different storage tier like gp2 with an appropriate storage amount, or Standard.
Turbonomic automatically evaluates all the possibilities and suggests the best course of action to maximize savings while assuring RDS performance. In this case, Turbonomic suggests moving to a gp2 storage tier and increasing the storage amount to have sufficient IOPS resources with the outcome of an overall reduced instance cost.
RDS scaling constraints
Turbonomic is aware of scaling constraints that are set by the RDS service and considers them in scaling decisions. One of the constraints that Turbonomic evaluates in the RDS scaling analysis is maximum supported connections for each RDS instance type, memory capacity and DB engine combinations. Awareness of such scaling constraints is what makes Turbnomic RDS scaling actions trustworthy and safe.
In the following example, Turbonomic identifies the maximum supported connections based on the instance type, memory capacity, and DB engine combination and assures that current and historical connections utilization is honored when scaling to a new configuration:
AWS RDS cost, details, and utilization
Turbonomic discovers and shows important RDS information for each RDS instance. It also shows historical utilization metrics for discovered RDS instances:
In addition, Turbonomic provides accurate RDS cost breakdown, including compute and storage costs for your RDS instances:
New automation policy for RDS makes implementing elasticity easier
For customers who want to manage their discovered RDS instances in Turbonomic, the new RDS policy design will help achieve such goal. With Turbonomic policies, customers can define different automation modes based on selected scopes as well as adjust scaling analysis sensitivity that will cater to different RDS environment types:
As customers undergo massive digital business and IT transformations, deploying workloads to the cloud, especially to the PaaS environment, they are faced with similar challenges as before. Mainly, how to continuously navigate the tradeoffs between performance and cost, how to continuously identify the optimal configurations for their cloud resources and applications, and how to do it all automatically.
Turbonomic helps customers overcome these challenges, making it easier to adopt and effectively use the PaaS services helping to transform their business. And by having software continuously optimize all their cloud resources for performance and cost, customers are able to achieve real business outcomes, whether it’s assuring the performance of revenue generating applications, minimizing cost, or significantly reducing end-user tickets.