<img alt="" src="https://secure.bomb5mild.com/193737.png" style="display:none;">

Turbonomic Blog

Latency, CPU, response time, can we treat them all the same?

Posted by Ben Yemini on Sep 9, 2016 12:01:38 PM
Ben Yemini
Find me on:

“Okay, I think I’ve figured this thing out. You can go up and down, but not side-to-side, or … back in time.”

Homer Simpson made this observation right after learning to maneuver a hot air balloon. Homer and for that matter Albert Einstein are correct. Space and time need to be treated differently….

So what does that have to do with your datacenter?

As you probably figured out by now our autonomic platform enables entities in the datacenter to self-manage assuring application performance while making best use of the underlying environment. The platform is built on a common abstraction and data model. It aggregates the right metrics and uses them to drive placement, sizing and provisioning decisions in real time.

I was recently talking with one of my favorite Turbonomic engineers, Meir Laker, and he explained why not all metrics are created equally.

Capacity vs. Time

Most metrics (e.g., CPU, memory, storage amount, network throughput) measure utilization of a datacenter resource that has a physical (or virtual) limitation. That limitation is its capacity.

Storage latency and application response time are different.  They do not measure resource utilization, but quality of service, more specifically, delay in service. The larger the number, the greater the delay and the worse the performance of the workloads using it.  Since there is no limit to the potential size of the delay, “capacity” has no meaning.  Instead, an upper bound must be understood as “maximum acceptable delay.”

Storage manufacturers give guidelines of expected latency for various devices which are used as the default or can be easily adjusted in our platform.

For packaged applications, a software provider can similarly specify an expected application response time. The maximum acceptable delay varies quite a bit by customer and application time. For example, production workloads might have higher quality of service requirements than Development workloads. Or a consumer facing web commerce application may have stricter response time SLAs (e.g. <500MS for a transaction) vs. a nightly batch run that reconciles orders and just need to run in less than 4 hours.
how to set performance slas in Turbonomic

In our platform we’ve made it easy for users to set the performance SLAs so a customer can apply them to any scope of workloads (see above figure). They can also leverage our instrumentation for specific application (e.g. MySQL, Tomcat, Oracle DB) or from a third party monitoring tool (e.g. Dynatrace, Newrelic, Prometheus) for metrics such as response time or transaction throughput.

Algorithmic Approach

The Turbonomic market-based algorithm applies to both types of metrics: resource utilization and quality of service.  Microeconomic principles of buyers, providers and price translate readily to our datacenter model for utilization metrics: a provider prices a commodity higher as its utilization approaches the provider’s capacity.  But, the model also applies to quality of service metrics. Buyers (e.g. consumers) make economic decisions based on quality metrics as well, e.g., since time is money, customers pay a higher price to shop at a convenient location. NYC residents do this all the time. But, when the price of convenience gets too high, they shop in NJ!

Similarly, datacenter providers raise prices for quality of service metrics as service levels degrade, causing workloads that need a higher level of service to shop elsewhere.

Performance is the Goal

Quality of service metrics are actually direct measures of the performance our platform assures.  Resource utilization, on the other hand, is not a direct measure of performance.  In many, if not all, cases, a resource can be driven to 100% utilization with no degradation to workload performance because the device was designed (specified) to handle exactly that performance.

The reason our platform does not drive a resource to 100% is to allow “headroom” for spiky behavior that might otherwise put a resource beyond its physical capacity.  So, while resource utilization is not a direct measure of performance, it is an “indication” of performance risk since if we go beyond the capacity, we will torpedo performance in one shot.

Effectively, then, our algorithm is converting resource utilization metrics into a measure of (risk to) quality of service.  And, therefore, both types of metrics are being used to assure quality of service.

You can learn more about how to deliver quality of service with our platform here.

Topics: Industry Perspectives, Networking and SDN

Subscribe Here!

Recent Posts

Posts by Tag