You received your first cloud bill and nearly passed out - wasn’t the cloud supposed to be cheaper than hosting workloads on your own hardware?
Don’t worry, you’re not alone in underestimating what the cloud will cost, but there are things you can do to reduce cost. In this three part series we’re going to review strategies for managing cloud costs and help you tame your cloud bill. In this article we’re going to look at:
- Effectively scaling your workloads to meet user demand
- Controlling your budget
Effectively Scaling Your Workloads
It is common in an on-premises environment to over-provision virtual machines to ensure that you are able to meet used demand. After all, you have the hardware and giving your workloads extra capacity will ensure that they have the resources to handle whatever load your users can send at it. While this same strategy will work in the cloud, the issue is that you pay hourly for that extra capacity that you allocate to your workloads.
The goal in any cloud deployment is to ensure application performance while maximizing the efficiency of your environment. Seeing a virtual machine running in the cloud with 10% CPU utilization should not give you a sense of calm, but rather it should give you a sense of anxiety about the bill that you will receive at the end of the month.
Cloud applications are elastic, meaning that they are designed to scale up quickly as load increases, but scale back down when load diminishes. The question, however, is how do you effectively manage at elasticity.
Let’s consider an example, see figure 1.
An eTailer might experience significant load on Fridays at 5:00pm when people get off of work and start their weekend browsing, but that same eTailer might have little or no load on Tuesdays at 2:00am. As a result, the eTailer will want to reduce its VM footprint at the off-peak times.
So how can this eTailer accomplish this? There are several strategies:
- Scale down based on the hour of day and day of week
- Setup an auto-scaling group configured against a static threshold, such as CPU or memory utilization, that automatically scales the environment down when resource utilization decreases and scales it back up when utilization increases
- Adopt a solution that keeps the environment in its desired state, which holds to the aforementioned tenet of ensuring application performance while maximizing the efficiency of the environment
Scaling based off of time is a dangerous option. If you were to adopt this strategy, what would happen if, all of the sudden, your application started receiving adoption by clients in another timezone? 2:00am on Tuesday in EST is 8:00am in England. This strategy might minimize your cost, but at the expense of not ensuring performance when it is needed.
Auto-scaling groups are a powerful tool that cloud providers give you, but the challenge is that your environment is very dynamic and static thresholds may not be good enough. Under very low load they may scale your environment down too far and under very high load they may not scale your environment up fast enough to meet user demand.
The best solution, therefore, is to constantly keep your application in a “desired state”. The desired state means that you have enough virtual machine instances to meet your user demand and ensure your application performance, but also that you do not have too many, or too large, virtual machines to maximize your efficiency, and hence your cloud bill.
Unfortunately, because of the dynamic nature of your application and the number of interrelated parts, achieving and maintaining a desired state is not something that can be done by your system administrator. Instead, it requires tooling that constantly monitors your environment and changes it based off of your workloads.
We like to view your cloud-based application as a supply chain, see figure 2.
Your applications runs on virtual machines that run on a host, such as a set of physical machines in an availability zone, that in turn run across a data center, or a region in cloud terms. Your virtual machine needs resources, such as CPU and memory, but it also needs access to storage and a certain amount of IOPS (input/output operations per second). By ensuring that each node across the supply chain receives the capacity that it needs, we can ensure that your application and environment remain in a desired state.
Controlling your Budget
Another challenge that we face when managing our cloud bill is the cloud bill itself. AWS and Azure cloud bills are aggregated across services, regions, accounts, and lines of business. When you review your cloud bill, the bottom line is easy to see, but understanding what is contributing to your cloud bill is a bit more difficult. Some customers that we have worked with have cloud bills that are more than 100,000 lines in a spreadsheet.
All of this is to say that understanding your cloud bill and reacting in a way to reduce your cloud bill is not a task that can be performed manually, it takes tools and data analysis. Furthermore, your cloud spend needs to be analyzed against your budget to ensure that it does not grow out of control.
Figures 3-5 below show some important ways to breakdown your cloud bill.
By understanding which services, which business accounts, and even which Cloud Service Provider, such as AWS and Azure, are most contributing to your cloud costs, you can better understand where your cloud money is being spent and react accordingly.
Finally, it is important to compare your spend against your budget, and alert when you are trending too high and will go over your budget. Figure 6 shows how we can analyze, in real-time, how your cloud spending is tracking against the budget that you’ve allocated.
It can be tricky to maintain your application performance while also maintaining your cloud costs. This article as reviewed two key considerations:
Scaling your cloud environment to maximize efficiency while still ensuring application performance
Understanding your cloud bill and reviewing your cloud spend to ensure that you control your budget
In the next installment we’ll review workload specific costs and how to best understand cloud regions, tags, and custom groups for controlling your cloud costs.