If you’re like me, you probably had, or know someone who had a house with a room that nobody was allowed to go into except for “special occasions”. The furniture was often covered in plastic, and the carpet was impeccably clean, vacuumed weekly despite nobody walking on it except the person using the vacuum.
This same practice is happening in your data center today. Imagine a case where there are multiple hardware platforms such as storage. You purchase a brand new all-flash array, and the engineering team is preparing to add it to the infrastructure pool.
The problem is that because it is a new, low-latency, high I/O storage unit, the team wants to only allow certain workloads to run on it. This tactic to supposedly “reserve” good storage for the virtual machines that deserve it is flawed. Let’s talk about why.
Trying to Outsmart the Workload
If you look at your virtual environment today, you can probably pick out the “busy application” according to some criteria that has been set by you and your team. This may be defined by what the application owner has told you because the business organization sees the application as the most important part of their systems environment. On it’s face, this is true, but the measurement of importance is intangible.
I have been in a number of organizations where the presumed Tier 1, most used application is actually not as high I/O as the business owner feels it may be. In other words, we have applied an artificial constraint on the environment because we allowed intangible factors to decide the level of importance. These are very important factors that should always be considered when building the SLA (Service Level Agreement) for our applications.
SLA is Support, Not Performance Related
This is where we transition to understanding the real purpose of the SLA. A Service Level Agreement is applied to the application to give a priority to uptime, and if you read the SLA for most applications, performance may not be mentioned. This is where things get interesting.
In the process of moving only the Tier 1 application to the new all-flash array in our example, we have neglected the fact that the Tier 1 application may not win by being on the new low-latency storage. What we may discover in practice is that the Tier 2 applications are running high I/O databases that would benefit greatly from the capabilities of the all-flash array.
If the same practice happened in managing people traffic on a train, it is like having a 5-car train that is at full capacity, and when the transit authority adds a sixth car to the train, you only allowed the certain people to use the additional car rather than letting the natural flow of people across all of the cars.
Every Field was a Green Field
You’ve probably heard the term green field deployment. It refers to putting all new infrastructure in place in the data center for building out the new environment. Much like the room with plastic covered couches, eventually the green field is no longer green. This is what we have learn to accept in order to get the best use of all of the capacity of the house, or in our case, the data center.
Adding new infrastructure and assigning policies around what workloads can go there has been, and continues to be a common practice. The thing that we neglect during this practice is that the workload that is high utilization right now, may not be in a day, or a month, or 3 months. This is why we have so much conversation happening around the SDDC (Software-Defined Data Center). SDDC is the practice of abstracting (aka virtualizing) the hardware to present pools of resources to the workload.
Abstracting our storage and presenting it to the hypervisor will provide equal access to the entire virtualization infrastructure. This means that the entire application base could make use of the faster storage in this example, or the faster networking in another case, or a faster CPU if that were the what the advantage was we were creating.
In other words, the system is surprisingly capable of delivering supply of capacity, so we shouldn’t try to make assumptions and apply artificial constraints. This is the core of what we talk about here at Turbonomic: assuring application performance while maximizing infrastructure efficiency.
No matter how smart my operations team was, we could never outthink the infrastructure. Maybe it’s time that we take the plastic off of the couches.