Kubernetes pods are supposed to be self-healing—that’s why they are referred to as self-healing pods. However, “self-healing” and “performance-assuring” do not mean the same thing. For our customers running multitenant environments with hundreds of applications and services, maintaining performance and application availability under fluctuating demand was a real challenge—even with self-healing pods.
For example, one financial institution’s initial implementation of OpenShift was relatively static compared to the dynamism Kubernetes is designed to manage. Occasionally, the team would experience congestion in the node caused by workloads peaking at the same time—stressing out the node, and in one instance even losing a node while there was still available capacity elsewhere. This has been an enormous challenge for the customer in general because they manage multitenant workloads with restrictions like label/selector, taints/toleration, affinity/anti-affinity, and the like. As they grew, this manual approach was impossible to maintain as a team and untenable for the business.
How It Works: Kubernetes Pod Self Healing
The Kubernetes scheduler does the work of scheduling a pod based on workload specifications (requests, tolerations, etc.) and the assessment of the cluster’s available resources to find the best possible, compliant node. This decision is made every time a workload is placed in the scheduler’s queue. But after the pod is scheduled and workload demand fluctuates, there is nothing that answers the question “Is this node the best place to continue to run this workload?” The only recourse for node congestion is to wait for pods to get evicted, thereby placing them back into the queue. This phenomenon is what is referred to as Kubernetes pod self-healing. Kubernetes reschedules the evicted pod, deeming it healed.
Per Kubernetes.io, “Pods do not, by themselves, self-heal. If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance. Kubernetes uses a higher-level abstraction, called a controller, that handles the work of managing the relatively disposable Pod instances. A given Pod (as defined by a UID) is never "rescheduled" to a different node; instead, that Pod can be replaced by a new, near-identical Pod, with even the same name if desired, but with a different UID.”
Eviction and rescheduling, aka “self-healing” is not a good strategy to ensure application performance. More stateful workloads may suffer availability as pods are killed, queued, and rescheduled; and if the pressure is high enough, not only do all workloads suffer, but the node itself can become unusable, forcing all pods to have to get rescheduled elsewhere—if there is capacity.
Thus, we believe that Kubernetes pod self-healing is a myth. For non-production pods which can withstand downtime, the native pod healing mechanism may be sufficient. But for mission-critical production Kubernetes pods, self-healing is insufficient because performance has already suffered.
Using Kubernetes Pod Moves for Continuous Application Performance
The aforementioned financial services customer that was using self-healing Kubernetes pods used our technique to automate pod moves. In this approach, rather than relying on the eviction-rescheduling loop native to Kubernetes, the customer gained the ability to automatically and proactively reschedule pods to nodes with the resources needed to reliably run the pod. In cases where no available resources exist on the node, a new node is automatically provisioned and the pod is scheduled on the new node.
How is the customer doing this? They are using Turbonomic. Since employing this new technique, the customer has completely avoided evictions due to resource contention from workloads peaking together.
In fact, benchmarked against the old method of self-healing Kubernetes pods, the customer saw the following improvements:
- Over 70% reduction in performance-related complaints
- Eliminated compliance issues associated with relabeling nodes—and with automation, no labor was required
- Continuously maintained high availability of nodes avoiding congestion that caused nodes to become unavailable
No Self-Healing Required: How Turbonomic Moves Kubernetes Pods
In order to avoid pod-evicting node congestion, you need analysis and prescriptive actions that understand fluctuating demand to determine where and when additional resources are needed. Turbonomic uniquely solves this problem, by continuously analyzing workload demand to drive placement decisions that assure performance, drives efficiency, while being compliant to placement rules.
To move beyond reactive pod self-healing, Turbonomic uses 6 data dimensions from Kubernetes:
- Memory usage
- Memory Requests
- CPU usage
- CPU Requests
- Pod density
- Rootfs/imagefs usage
Turbonomic also automatically discovers compliance policies of node selector strategies—whether node labels, taints/tolerations, explicit affinity, or anti-affinity rules—to make the assessment of which pod to move when and to which node.
Turbonomic generates Pod Move actions that will show the user the main performance, compliance, or efficiency risk being mitigated, along with the impact of this and other actions on this node to show improvements in the nodes that are impacted. (see Figure 1)
Figure 1: Turbonomic Pod Move Action to Address Node VMEM Congestion
The user can also see the benefits across all the compute nodes in the cluster by seeing before and after simulation of actions executed, providing more proof of benefits to take the actions (See Figure 2). In this view we can see that the first node is highly congested for CPU (yet pods are not being evicted), and there are clearly underutilized nodes. By moving workload around we can safely mitigate the congestion on the first node, and even suspend one of the compute nodes, all while maintaining availability of the application whether it is stateful or stateless.
Figure 2: Turbonomic Projection of Node Utilization Improvement Achieved Through Taking Actions
In the event that there is no available compliant node capacity left, Turbonomic will generate a preventative and pre-emptive Node provision action that, when executed, will allow Turbonomic to move pods to this new node, assuring the node’s usage without waiting for pods to get evicted.
Self-Healing Kubernetes Pods: Myth
In conclusion, Kubernetes pods self healing is a misnomer at best. While certain use cases—mostly Dev/Test and Pre-Production—can withstand the downtime associated with the eviction-queue-reschedule loop, self-healing pods lead to downtime, complaints, and lost revenue in production settings. Thus, we hope you will consider evaluating the potential benefits of continuous pod moves with Turbonomic. Let’s provide you with a demonstration today!