Back to Blog

Matt Ray

More Hardware vs. Data Center Control

I’ve run into quite a few enterprises recently who are in the business of throwing hardware at problems, or were in the business of throwing hardware at issues and have since wised up and decided to look at the problem differently.

As an example I was visiting a large customer the other week that was significantly over allocated in terms of vCPU to CPU cores. Across the entire environment CPU ready time had shot up above 20%. The conversation went something like this:

Me: “Wow you guys have some of the worst ReadyQ times I’ve ever seen! Did you know that?”

Manager: “Yes we know. It kills performance. Unfortunately we have been slow in purchasing new hardware to fix the problem.”

Me: “You’re planning on adding more hardware? How long do you expect that to fix the problem for?”

Manager: “We’re in the process of doubling our host count. Next time we’ll be more proactive in buying more hardware so that it does not get this bad again.”

WOW! Doubling the host count. Their solution to fixing performance problems was throwing AWAY more hardware at the problem than they really needed. Memory and CPU utilizations within the cluster were both in the 50-60% range. We went on to explain that the supply of resources within the environment is not the limiting factor in the environment. The problem is the supply of resources is not being managed optimally to meet the demand for resources coming from the virtual machines.

Unfortunately this isn’t an uncommon problem. We regularly run into organizations that feel the best way to fix a problem is to throw hardware at it. The problem is, throwing hardware at an issue is a patch not a fix. With a growing organization, especially an organization that is rapidly expanding into delivery of digital content, the problems come back quickly. Sure by throwing hosts at the issue they’ll be able to grow for 6 months with underutilized infrastructure, but then they’ll have the same issues. In the case of this customer the CPU ReadyQ will come back, for other customers the problem may be managing host level swapping, or disk access latency.

At the end of the day all of these potential problems come back to one problem: How do we appropriately match the supply of resources on the physical infrastructure with the demand of resources from the application workloads. This is where VMTurbo comes into play, acting as a control platform to intelligently match supply with demand in the infrastructure, and delivering the results as a set of actionable items.

readyq - to do 1

In the case of the above customer we asked them to take a step back, and instead of instantly adding more hardware give managing the workloads properly a chance at improving performance. The result was incredible.

ready1 - after

CPU Ready within the environment dropped by more than 10% in less than 10 minutes. VMTurbo was able to show that instead of doubling the hosts in the cluster the customer could effectively manage the current infrastructure with one additional host.

In addition to using placement of virtual machines to better manage CPU Ready within the environment VMTurbo was able to provide decisions around which VMs should be downsized in order to further reduce CPU Ready times.

readyq - reduce vcpu

By introducing a system to control the environment in a healthy state vs throwing hardware at the problem the customer was able to significantly reduce expected future hardware cost, and guarantee that revenue generating apps remain performant.

Read more about many of the challenges in today’s data center here.