Originally posted on VMBlog.com on October 7, 2021.
Why Setting CPU Limits Can Slow Response-Time
Today, the majority of enterprise organizations running mission-critical applications on Kubernetes are doing so in multitenant environments. These multitenant environments rely on the setting of limits to regulate the tenant workloads or to use limits for charge backs. Some Devs will set CPU limits for benchmark testing for their applications.
CPU throttling is the unintended consequence of this design. Take a look at this example…
Figure 1: CPU with 25% utilization
In the above figure, the CPU utilization of a container is only 25%, which makes it a natural candidate to resize down.
Figure 2: Huge spike on Response Time after Resize to ~50% CPU utilization
But after we resize down the container (container CPU utilization is now 50%, still not high), the response time quadrupled!!!
So what’s going on here? CPU throttling occurs when you configure a CPU limit on a container, which can invertedly slow your applications response-time. Even if you have more than enough resources on your underlying node, you container workload will still be throttled because it was not configured properly. And the high response times are directly correlated to periods of high CPU throttling, and this is exactly how Kubernetes was designed to work.
To bring some color to this, imagine you set a CPU limit of 200ms and that limit is translated to a cgroup quota in the underlying Linux system. The container is only able to use 20ms of CPU at a time because the default enforcement period is only 100ms. If your task is longer than 20ms, you will be throttled and it will take you 4x longer to complete the task.
Your applications performance will suffer due to the increase in response time caused by throttling.
How Do You Avoid CPU Throttling in Kubernetes?
CPU throttling is a key application performance metric due to the direct correlation between response-time and CPU throttling. This is great news for you, as you can get this metric directly from Kubernetes and OpenShift.
To ensure that your application response-times remain low, and CPU doesn’t get throttled, you need to first understand that when CPU throttling is occurring you can’t just look at CPU utilization. You need to take all the analytics that go into application performance into account. Turbonomic has built that analytics platform.
When determining container rightsizing actions Turbonomic is able to analyze 4 dimensions.
- CPU Limits
- CPU Requests
- Memory Limits
- Memory Requests
Turbonomic is able to determine the CPU limits that will mitigate the risk of throttling and allow your applications to perform unincumbered. This is all through the power of adding CPU throttling as a dimension for the platform to analyze and manage the tradeoffs that appear. Once the dimension of CPU throttling is added, this will ensure low application response-times. Check out this video to see it in action.
On top of this, Turbonomic is generating actions to move your pods and scale your clusters—as we all know, it’s a full-stack challenge.
Customers have the ability to see the KPIs and ask ‘which one of my services is being throttled?’ It also allows them to understand the history of CPU throttling for each service—and remember that each service is directly correlated to application response-time! As one customer said, “This CPU Throttling has been plaguing us. What Turbo provides will save time and performance.”
The benefit of Turbonomic is our ability to quickly identify and solve a consequence of a platform strategy rather than have the customer redesign their multi-tenant platform strategy. Not only can Turbonomic monitor CPU throttling metrics, but the platform can also automatically right size your CPU limit and bring the throttling down to a manageable level.
Learn More About CPU Throttling!
If you are interested in learning more about the Kubernetes community and the adverse impact of CPU throttling, check out these articles:
- CPU limits and aggressive throttling in Kubernetes
- Kubernetes: Make your services faster by removing CPU limits
This one by Dave Chiluk is one our favorites: Unthrottled: Fixing CPU Limits in the Cloud Not only does he offer a nice illustration about throttling, but he also presents an interesting Linux kernel bug related to throttling and fixed it. Elsewhere, Dimitri Stiliadis from Palo Alto networks wrote a program to illustrate the impact of CPU limits on application's latency. There has been a lot of debate in the Kubernetes community around whether it is good or bad to use CPU limits that even Tim Hockin offered his guidelines.
About the Authors
Cheuk Lam, Software Architect, Advanced Engineering at Turbonomic
Cheuk Lam is a software engineer at Turbonomic. He studies cloud native technologies and develops solutions to continuously optimize workload scaling and placement in multicloud environments.
Enlin Xu, Director, Advanced Engineering at Turbonomic
Enlin Xu is a proud graduate of Columbia University and has been a software engineer in Turbonomic since 2011. Now he is the Director of Advanced Engineering and leads the application of Turbonomic's analytics platform to Cloud Native technologies.
David Blinn, Software Architect, Advanced Engineering at Turbonomic
David Blinn is a software engineer at Turbonomic. He works on solving application performance and system scaling challenges with a focus on containerized environments.