Back to Blog

Mor Cohen-Tal

Climbing the Cloud Optimization Curve

Read more Turbonomic reviews

It’s the 7th of the month and your CFO just received another cloud bill. Again, just like many times before, it has reached an all-time high much sooner than predicted or budgeted for. Sound familiar?

I hear the same story from almost every organization I am introduced to.

So what happens next?

Sometimes the CIO must face the board and ask for a budget increase, but usually before that happens, a committee is established tasked with reducing the bill. The target: ensure the next bill is back on track below a certain threshold. The team usually consists of top cloud architects, a few finance personnel and in most cases a subset of a newly formed “cloud governance” team. These top talents are removed from their day job to fix the problem of cloud bills running amok.

They take over a conference room and establish contact with the closest pizza deliverer. They spread the hundreds of pages of their cloud bills on the desk (or their spreadsheets) to try and find ways to reduce the bill. Sometimes they leverage a tool built to facilitate drilling through cloud bills, sometimes they create their own spreadsheets, most commonly a combination. They then try to correlate this information with information they gather from a multitude of monitoring tools, and conversations with application owners.

The outcome: multiple days later, a set of recommendations on how to get the bill below the targeted threshold is distributed;­­ What components can be deleted, what should be rightsized, what can leverage cheaper storage, what RIs should be bought, and so on.

Most of the recommendations are then taken and the cloud bills slowly start decreasing. Everyone can breathe again and life goes back to normal. The team is dismantled and they go back to their day jobs.

Slowly but surely the cloud bills start creeping up and it’s the 7th of the month again, and the CFO calls, the team is reassembled, a new target is set, a conference room is taken over and pizzas start getting delivered.

And so we continue... Cloud bills reach an all-time high surpassing budget and expected growth, we find ways to reduce it, it creeps up again, we try to put a band-aid on it again – creating what I now refer to as “The Saw” of cloud optimization.

Success at Scale-2

Sharing with you that 80% of organizations are overspending in the cloud is probably very surprising. You're thinking "only 80%??". The reality is that cost overruns and exceeding cloud budgets is such a ubiquitous problem that it has almost been accepted as the reality of operating in the cloud. If at all possible, the problem is actually worse than it appears because claiming we exceed budgets has a built-in assumption that organizations have well defined and thought through budgets that they blew by.

The truth is that rarely do teams have any idea how much applications should cost, whether they were migrated or completely greenfield. The bottom line is, that the vast majority of applications are costing more than they should in the cloud and we don’t even know by how much.

The promise of the cloud is that you "only pay for what you use" but we haven't reached that holy grail - yet!

So how do we get out of this loop? By solving the problem, and not the symptom. Cloud bills keep creeping up because we don’t continuously ensure the estate is optimized and fall back into the same patterns that caused the problem to begin with.

Application teams continue developing, as they should, creating new versions of applications, deploying test beds and development environments. Resetting the configuration of the applications based on what they perceive they would need. All of these override the original efforts and create new idle and forgotten resources, new over-provisioned workloads. The RIs are not being managed, so the new deployments are not in line with what has been purchased.

Digital transformation and a rush to the cloud are placing enterprise IT teams under tremendous pressure. While cloud addresses an old pain point – that infrastructure supply is static while application demand is fluid – actually matching this demand with supply in real-time requires more decisions than any human being can make. Hybrid-cloud estates are unbelievably complex. There are millions of configuration options for EC2 instances alone, over 90 additional services available in AWS, and similar numbers in Azure. This is simply too much complexity for the average IT team to manage, and as a result, many organizations that kicked off digital transformation initiatives with high hopes end up watching innovation grind to a halt while the IT team struggles just to keep the lights on.

Cloud platforms enable elasticity and increase an organization’s ability to be agile, but how do you truly take advantage of these without drowning in overwhelming cloud bills?

By helping people focus on what they do best – develop, create, innovate – and let software manage the complex resource and cost tradeoffs ensuring cloud environments are constantly optimized. The pace of innovation increases and cloud costs are always in check.

Keeping the environment constantly optimized requires ensuring that you capitalize on the promise of the cloud – only pay for what you use – by constantly ensuring that applications are receiving exactly the resources they need to deliver on their SLAs, as cost effectively as possible. There are a few required capabilities to accomplish that:

  1. Full Stack visibility: Understanding what application consume what underlying resources across compute, storage and network whether running on IaaS, containers or other services
  2. Real-Time Vertical and Horizontal Scale decisions: Understand application SLAs and act to assure resources are continuously performant as cost effectively as possible within the constraints and policies defined by the business
  3. Delete/Suspend unused resources: Constantly clean up the estate from unnecessary resources
  4. Align with work cycles: Schedule suspension of workloads to ensure that when people aren’t using them, they also aren’t paying for them
  5. Leverage discounting mechanisms:  RIs, promo SKUs, spot instances and so on
  6. Automation: Real time or schedules to fit change windows automate the actions and integrate optimization to be part of the deployment and daily process of the management of the estate, instead of sporadic efforts

Fortunately, there are some low hanging fruits to harvest that can get cloud users started on the journey to intelligent cloud optimization. My next blog will explore these capabilities in more depth and in the context of more advanced techniques that can eliminate the loop altogether – at scale and across multi-cloud environments - through advanced AI, while also continuously assuring application performance and adhering to business policies.

 

 

 Learn How to Become a Cloud Rock Star!