Being Ready - The Ready Queue and Sizing Conundrum
This series represents a dialogue between Operations and Development perspectives to understand the gap between how each views technology and measure their success – realizing it is possible for Ops and Dev to be successful as teams, but for the organization to fail. So what can we do to better align Ops and Dev so that they can speak the same language and work towards a more common goal? This article series attempts to address a portion of this problem by presenting development teams insight into how specific decisions impact the day-to-day operational requirements of an application. In this case, let’s look at the Ready Queue.
We’ve all seen it before; the request comes in from the application development team with a simple one-liner: “We need four 16 vCPU machines for the application environment” and the operations team winces at the idea. Why is that?
One of the challenges we have with this scenario is that there should be questions about what the requirements are for the application environment as the first step. While the developers have clearly highlighted that this application is CPU intensive, the question should be asked as to whether the application really requires, or will effectively use, such a CPU-heavy configuration.
Are you ready? Understanding a CPU Ready Queue
The CPU Ready Queue is a critical virtual machine metric measured as a percentage of time where the CPU has a queue of work scheduled, but there are no physical CPUs available to service that request. This is an often-misunderstood metric, and it can be variable for many reasons. Let’s look at an analogous example to get us started.
It’s the on-ramp of a busy highway. There is a steady stream of cars on the highway, each with room in between for a small car to merge. It’s like a zipper effect in an ideal situation where one enters between each and every gap to the highway, but that relies on the fact that the inbound vehicle is sized to fit into the gap available.
This is the ideal situation, but just imagine what happens if what is trying to merge isn’t a small car, but rather a bus.
This is exactly what happens when the available CPU cycles (left-lane) are suddenly faced with a situation where the requesting workload (right-lane) is larger than the available. The very thing that you would imagine as the worst-case scenario occurs. The cars ahead of the bus are merged smoothly into the available spaces, but the bus is left waiting for a large enough gap to squeeze in. Accommodating the size of the bus will be disruptive to the flow of traffic, and will inevitably mean that a queue builds up behind the bus while it is being merged in.
Our CPU Ready Queue works the same way. If every CPU work item in the queue is the size of an available physical CPU, then it will be moved through in a nominal fashion and the CPU scheduler in our hypervisor will keep the flow going as efficiently as possible. This will be illustrated by a low percentage CPU Ready metric that we usually represent as %RDY.
What Causes the CPU Ready Queue to Grow?
Our example had the fitting into a car spot analogy, but let’s look at real CPU contention in our virtualized data center to show what the different situations can be. These include:
- CPU affinity
- CPU oversubscription (too many guest vCPUs)
- CPU reservations
Let’s quickly go through what each of these means so that we can understand where the limitations are.
When we assign multiple virtual CPUs to a guest machine or instance, we are creating a potential bottleneck because every request that the CPU scheduler services for that guest will require a physical CPU with enough cores to accommodate the virtual workload on a single physical CPU.
In a case with 4 Dual-Core CPUs, and a virtual guest with 2 vCPUs, it will require an open space on an entire physical CPU to handle the request. This doesn’t seem like a problem with 4 CPUs, but what if there are 15 other guests with extremely CPU intensive applications running? This is a fast path to a growing %RDY number as the dual-vCPU requests line up behind the other CPU requests in the scheduling queue
Inside the application itself, we should ask whether true multi-threading is also capable. Quite often, an application drives up CPU utilization, thus increasing the ready queue, and when another vCPU is attached to the guest, we don’t actually see the benefit because the application is unable to thread across the vCPUs. That will result in a performance graph like this:
The performance monitor shows that we have a large spike to the first of four vCPU instances. For our CPU scheduler, we are now driving up the queue depth forcing lower performance overall for our current application instance, and also other application environments which are co-located on the same host.
There appears to be no multi-threading happening in the application based on this graph, or there is a runaway thread on a single CPU. Tracking the issue can be challenging in this case, and during utilization spikes we are increasing latency, which in turn increases application response times.
In an environment where response times are part of an SLA (Service Level Agreement), this means a potential failure to meet the SLA. Even in the absence of an SLA, the overall consumer experience will become degraded. Degradation of performance is a sure way to create an unhappy customer situation.
There have been many “best practices” suggested over the years since virtualization have entered the mainstream. CPU affinity, which is pinning a workload to a specific physical CPU, has been a contentious issue throughout the course of the evolution of data center virtualization. There have been thoughts about allocating CPU this way to ensure that a priority is set to that particular guest for its CPU. The challenge is that we are also limiting the ability for a dynamic change in workload.
There are a number of reasons why this may have validity in its design, but as we look at the distinct advantages to it, more and more organizations are moving away from the practice. Using affinity forces the workload to a particular environment, and is a static state. There will be no adjustment to where the guest runs because we have effectively made this a completely manually managed instance now.
This is a fundamental problem of the traditional model of managing the virtualized workload. How many people who have used affinity rules will revisit them? How many times do we re-evaluate the current state and compare to the desired state?
Affinity itself is a bigger topic, but you’ve got a good handle on how it can impact our CPU problem. We are going to dive in deeply to affinity a little later in our series to really attack that individually.
What’s Next for Dealing with CPU Ready Queue?
Beware of oversubscription! Right-sizing your environment is something that will lead to the ultimate solution to reach the ideal efficiency. When using multi-vCPU guests, we need to be sure that they are being utilized effectively, or else we risk creating bottleneck situations.
In our next article we will also look at vCPU oversubscription, and CPU reservations and then tackle the mitigation strategies, both good and bad, to see how to deal with the problem of CPU Ready Queue and the drive towards controlling the desired state.