Back to Blog

Ben Yemini

Is there an optimal CPU utilization?

I had an interesting conversation yesterday with a leading provider of travel services about optimal CPU utilization in their virtual infrastructure. Their environment of over 350 hosts and 2,000+ VMs, including multiple remote locations and a few centralized ones, is diverse. The hardware footprint includes Dell and HP rack mount servers as well as Dell EqualLogic and VNX vMAX block-based storage arrays. They are leveraging three different hypervisors including VMware, Hyper-v and RHEV with SCOM and vCenter for management. They have virtualized 65% of their workloads including IOPS intensive application such as SQL (most of the remainder were HPC). So what makes this interesting?

The reason we were even talking was their own admission that they are “playing it safe.”

They may over commit vCPU, but not vMem, and they don’t thin provision storage. They realized that in the future they will no longer be able to add endless resources, and will need to increase utilization across the board.

Increased CPU utilization increases risk of latency. In fact, queuing delay increases exponentially as utilization gets closer to 100 percent. So your optimal CPU utilization, or infrastructure utilization in general, depends on your risk appetite. 

Their challenge is to find the acceptable tradeoff between utilization (lower cost) and latency (higher risk). Based on the chart above, 90% utilization would expose them to an unacceptably high latency risk. But where was the line?

We used Turbonomic’s capacity planning feature to model their workloads at higher densities. Turbonomic simulates actual workloads (including the interdependencies caused by shared resources like storage arrays and CPU cores), so we could accurately model the CPU utilization (as well as memory, IOPS, and network) as we removed hosts from their environment.

Using VMTurbo to find the optimal CPU utilization for your datacenter
This view shows the current and simulated state of the environment. Each group of bars represents a physical host, and each bar within the group represents one utilization metric on that host, e.g., memory, CPU, IO, network, swapping, etc. In this datacenter, we can safely turn off 14 hosts and still keep application latency at an acceptable level (this particular simulation retains HA, but ignores affinity / anti-affinity rules).

We found that 70-75% range was the right balance for their business. That utilization fully supported their average infrastructure demand, and it gave them the right buffer to support workload peaks without degrading performance.

The difference between their current utilization of 60% and their safe utilization target of 70-75% translated to hundreds of thousands of dollars in hardware and software licensing costs – all without degrading performance.

What is the optimal CPU utilization for your business?

You can theoretically build a CPU utilization / latency curve for your own business. In practice this is pretty difficult.

To do this you need to first understand resource consumption demands (peaks and averages) of existing workloads across the 4 “food groups” of CPU, memory, network and storage, over appropriate time frames. Next you need to understand available supply from underlying hosts and datastores, and possibly the underlying IT infrastructure including physical storage arrays and compute fabrics. The final step is to use this data to align demand with supply (through VM placements across hosts and datastores) such that the total infrastructure cost is minimized. The easiest way to accomplish this final step is allow VMs to iteratively “shop” for resources from the suppliers until equilibrium utilization is found.

Now that you have your datacenter modeled as an economy, you can ‘shock’ it with peak loads and observe how latency increases. Build in just enough buffer to keep the workloads performant under average peak loads - that is your optimal utilization.

So how did we do it with the travel company?

Internally, Turbonomic runs the same market simulation described above. It actually runs it both for planning, as well as during live operations. Not only was the travel company able to decide that 70-75% CPU utilization range was right for their business, they were able to automatically maintain that target with Turbonomic’s control system.

Using Turbonomic, you can adjust your target utilization. This enables you to gauge business risk, and balance that risk against higher utilization (and cost savings).

 

VMTurbo policy editor interface
Turbonomic policy settings allow administrators to set their desired infrastructure utilization.

Turbonomic Operations manager can simplify these steps for you, allowing you to create plans for a multitude of capacity “what-if” scenarios, ensuring that you always have the right amount of hardware at the right time, to assure application service levels while utilizing your infrastructure assets as efficiently as possible.

 

Best Practices for vCPU to CPU Ratio Management