Using The Network as an Analogy To Describe The Opportunity
Quality Of Service (QoS) Management is not a new concept. IT organisations have been leveraging QoS controls in networks for many years. It has resulted in the emergence of multi-billion dollar companies like Riverbed.
QoS in the WAN has resulted in the ability to intelligently multiplex the bandwidth so applications can share the available resources without service degradation, while maximizing the efficiency of WAN connections. This has radically changed the total cost of ownership of running application services over wide area networks.
So what about the data center?
In the data center, virtualization can enable the adoption of resource sharing across a diverse portfolio of applications. Today, enterprises can run development and production applications at scale in these shared environments. To facilitate this, clusters of shared compute and storage resources host multiple application workloads, all enabled through server virtualization. These clusters of resource represent the bandwidth in the data center and the workloads represent the traffic that traverses the network consuming the bandwidth.
When thinking about our WAN analogy, the individual compute nodes and storage LUNs/volumes are like channels in the bandwidth, and the individual resources associated with these channels (e.g. memory, CPU, I/O, network) are sub channels of capacity within these channels.
Given each application workload has different characteristics, both in the type of resources they consume (e.g. memory, CPU, I/O network) and the rate at which they consume these resources over time, you get radically different outcomes in how these channels and sub channels of bandwidth are utilized, depending on how workloads are placed onto these channels over time.
This is a problem for “software” not humans to solve, because of the complexities of managing a multi-dimensional problem, which changes in real-time with the volatility in workload demand. Chris Swan, CTO for Delivery at DXC, refers to this as a “bin packing problem” in his blog on Virtual Machine Capacity Management.
So you are probably saying to yourself at this point, why should I care?
This topic sounds like the subject of a research project at Columbia or Stanford University. Well not really and here is why.
This is an important problem to address because lots of things happen in the data center which are not foreseen, and the ability to respond to this adversity is critical to maintaining QoS & service consistency. Examples of this could include sudden increases in workload demand because of an unforeseen business event or the result of provisioning new workloads which behave in ways not anticipated, software upgrades which change the behaviour of applications, the implementation of new security software which places greater overhead on the infrastructure, or the failure of compute nodes which reduce the available bandwidth to run existing workloads.
In an attempt to address this challenge, hypervisor vendors added scheduling algorithms to their platforms to deal with workload placement, but the reality is they do not provide a holistic solution to this problem, because they focus on a single dimension of the puzzle, not the entire puzzle. This is clearly visible when you benchmark their effectiveness, which is something I will clearly highlight with some real-world examples in a subsequent post.
Today, organizations try to address this gap by over provisioning physical compute and storage resources, and by implementing sophisticated monitoring and incident management processes, so that humans can be quickly respond when the service is at risk. And “scream for help” from the best experts in the IT department to help recover the situation. These strategies come with a significant overhead in both risk and cost, which is accepted today because there is no better alternative.
What if you could address this challenge differently in software and get a different set of business outcomes.
- Using your existing data center bandwidth in a much more effective way to maintain consistent workload performance, even in the face of adversity, without human intervention.
- At the same time utilize your assets in a much more efficient way, which gives you more headroom to respond to new business demands for compute capacity and changing the cost curve associated with the ongoing investment required to operate your data center and deliver IaaS, PaaS and VDI?
- Maybe you want to avoid investing any more money in your on-premise infrastructure and have plans to exploit the public cloud, but at the same time you do want to maximise the investments you have already made to avoid unnecessary monthly cloud expenses.
Having quality of service controls in your virtualized data center provides a unique approach to do things a different way, which keeps your business safe and minimises the TCO of data center services so that manpower and dollars can be re-directed to things which make your business more productive.