<img alt="" src="https://secure.bomb5mild.com/193737.png" style="display:none;">

Turbonomic Blog

Less Troubleshooting or Less Troubles?

Posted by Yoni Friedman on Sep 21, 2016 8:50:17 AM

It is very trendy to discuss and compare the methods in which data is collected by IT monitoring tools. Different vendors explain why they are the ones that collect the best data, from the best sources and in the best level of granularity. One example is VMware’s “Why data granularity matters in monitoring” blog.

The author explains the importance of data granularity through a simple example; MEM utilization and demand within a host. Looking at peak utilization measurements for a specific host during a period of a month, the author shows how the peak measurements differ depending on the time granularity. get are different for 5-minutes, 1-hour and 24-hour granularity measurements (106%, 57% and 32% respectively).

He then explains that storing specific data points (e.g. maximum, minimum, etc.) cannot replace high data granularity, since we never know what specific data point we will need for a new analysis model we might implement in a week, a month or a year from now. According to the blog, whatever machine learning, adaptive baselines or analytics we will need to use – we should have the rawest, most granular data set we can.

According to the author, better data-sets should lead to improved results. But I believe that for monitoring tools, the quest for perfect data now supercedes the pursuit for better results, leading us down a perpetual chase after more data and more elaborate analytics. The importance given by the author to improved data-sets is a double-edged sword.

Monitoring tools get so caught up in a “who has the best data” comparison race, they have no attention left for disrupting and finding better ways to manage your cloud – they can only improve what already exists. I was led to this conclusion by asking the following questions:

How much is granular enough?

In the blog, vROps is said to collect 15 data points every 5 minutes. But had we been collecting 50, 150 or 300 data points in that time interval, we might have seen that the 106% measurement is not the true peak. So how many data points is enough? Is there a finite point at which we will finally be content? Do you feel like your data is granular enough?

Do we store more than we save?

Collecting more data = Storing more data. So the more granular we get, the more IOPs we use and more storage we use. When do we get to the point where we use more resources than our monitoring tool “saves”? Will we know when we have reached that point? Do you know the current cost of storing all your monitoring data?

How have we improved?

Our monitoring tools have become so complex that their insights aren’t easy to act upon in many cases. That is why most monitoring tools don’t provide all the results, but rather use these advanced insights for suppressing alerts they believe are false, leaving the operator with fewer alerts. So they save time and help us focus on the true alerts, good! But did they make the process of solving these alerts easier? When was the last time they helped you change the way you are handling contentions and failures?

Understanding and answering these questions is important, because they touch the most crucial aspects of managing your environment: what you spend your budget on, what you spend your time doing, and whether you improve your environment’s performance in the process. For me, the most important question is – what is your goal? Less troubleshooting or less troubles?

Topics: Industry Perspectives, Virtualization

Subscribe Here!

Recent Posts

Posts by Tag

See all