There are no shortage of confusion talking about how CPU queueing works and how it ultimately affects your application and environment performance. Virtualization gave the industry something wonderful by enabling sharing of physical hardware resources, but it also opened the door to hidden issues that IT ops and application developers still struggle with every day.
Let’s quickly review what CPU queueing is and how processor wait times can have a catastrophic effect further up the stack.
Virtual CPU Architecture Recap
Each physical CPU will be divided into virtual cores and shared by the hypervisor which provides resources to the virtual workloads. A 4-processor server with 24 cores per processor nets 96 total cores available to the virtual workloads.
Virtual Machines can be allocated multiple CPUs which map to the physical core and CPU architecture of the host server. VMs requesting CPU resources are queued according to the virtual to physical mapping.
A 4-vCPU VM with multi-CPU application requests requires not just virtual, but physical access to 4 simultaneous CPUs. If any of the CPUs are processing existing requests, the 4-vCPU VM is experiencing wait time to complete its request.
VMware documentation indicates the following:
Even when only a part of the virtual CPUs is really required for the current workload in the virtual machine, that workload has to wait until all, even the currently unused virtual cores, can be run on the physical processor. This behavior is also known as "co-stop"
There are numerous factors which will lead to co-scheduling wait times. The fluctuations in your application demand will cause the underlying hypervisor and host to continuously try to schedule available CPU resources. Those wait times, while seemingly small, have a very real impact.
What is the Impact of CPU Queuing?
How does this play out in your virtualization environment? It is typical to have a CPU and memory oversubscription in your virtual to physical ratio. This is by design for a virtualization environment. Even a conservatively oversubscribed hypervisor host will see periods of high wait time which may not seem to be a problem by the naked eye. Your applications (and your customers and employees) will definitely pay the price of CPU queueing issues.
Problem #1: Peaks = Performance Degradation
CPU wait times which reach peaks causes queues to increase. These are difficult to see with the native tools, but the real effect is felt at the application layer with transaction slowness and delayed application response.
Workloads of all types, across any hypervisor will experience performance degradation and risk. You can see here from a view in an active VDI environment that peaks of CPU utilization are occurring and that has been correlated to user experience issues.
CPU queue depth increases during those peak times resulting in application delays. This is critical for latency-sensitive workloads and will have a negative performance effect on workloads of any kind.
Not only is the direct application effected, but any VM which is causing high CPU usage and queueing will now impact every VM, every container, and every application that is on that same physical host. You now have a risk that one of your application servers could be taking your call center application offline or stopping transaction flow into your databases applications.
Problem #2: Over/Under Allocation of Virtual CPUs
One of the most common multi-CPU capable workloads is virtualized SQL servers. Most SQL distributions even have recommended best practices that have a minimum of 4 CPUs, but often even higher.
Assuming we have an 8-vCPU VM with 24 GB of RAM, the SQL application will be required to queue for availability of all 8 CPUs to process which is likely to incur some wait time. That wait time translates to slower queries, poorer SQL performance, and any applications depending on that SQL server will suffer as a result.
VMware even clearly documents the risk in their 82-page guide on recommended practices for architecting SQL for vSphere:
Even if the guest OS does not use some of its vCPUs, configuring VMs with those vCPUs still imposes some small resource requirements on ESXi that translate to real CPU consumption on the host. (1)
You can see the real impact of both peaks of utilization and queuing for multi-vCPU systems. It’s inevitable that you are experiencing it today even with the capabilities of the native tools to try to manage it.
Solving the Peak Performance Challenge
CPU utilization peaks and high CPU queue depths cannot be solved by simply relying on the native schedulers. This has proven to be true whether on virtualization or containerized platforms.
You can now see the full impact which led to Turbonomic automating moving and sizing of resources that resulted in a performance improvement from the application down across the entire cluster that was being automated.
The next step was to virtually merge 25+ disparate clusters into 5 virtual “superclusters” and it resulted in reducing the overall host count from over 80 hosts to under 60 hosts in just 7 days! All of this was while improving performance for the entire application environment and the resulting host reallocation restarted a new data center project which was frozen because of a lack of ability to get new servers.
Solving the CPU Count Challenge
A real example came up recently which highlighted how Turbonomic solves the problem of performance for any application, including SQL servers. A DBA received notification of slow SQL performance which they determined was related to CPU issues. The IT Ops team investigates the host and does not see any consistent patterns or active issues at the time they get the call from the DBA. Nobody is sure how to resolve the issue with the native tools and data.
Turbonomic indicates a resize action for the SQL server to be sized down from 16-vCPU to 8-vCPU in order to improve performance. While this may seem counter-intuitive, the Turbonomic platform identified that the SQL server will have consistently better application-layer performance because of less CPU queue wait time for a 16-CPU instruction. By moving to 8-vCPU, the SQL application has increased access to the available CPUs which have a much lower queue depth.
Both the DBA and operations teams now have the precise decision, action, and analytics to back their choice. They are also able to automate a scale-up policy so that the servers can scale dynamically as-needed in the case of future contention.
Here is a view of an application and how it began to reduce the peaks and future actions are already available to further decrease the volatility and increase overall application performance.
Now we see what happens when you enable Turbonomic to place and scale VMs which delivers a marked reduction in peak utilization, CPU ReadyQ (VMware), and an increase in overall application performance as a result.
Proof that sizing down for performance is actually the right solution and using Turbonomic Application Resource Management ensures this can be done dynamically without risk.
Performance Goes Far Beyond Peak and Queues
These 2 techniques achieved by reducing peaks and rightsizing for CPU performance have been proven out across thousands of environments, but this is just looking at the two specific issues we wanted to highlight.
There are many other performance risks that may be present which span memory, storage, network, and much more. We wanted to highlight the specifics of CPU queues on application performance here. Look for more on the blog here for other valuable performance tips and tricks and how the Turbonomic Application Resource Management Platform can help you!