Back to Blog

Chris Williamson

Is AWS the answer to your performance management problems?

aws performance management logo

Amazon Web Services (AWS) is a bit of an odd bird. If I had told you 15 years ago that the company delivering your Windows for Dummies book would soon be delivering about half of public-cloud-based IT services around the world, I wouldn’t blame you for calling me a dummy. Joking aside, AWS posted about $6BN in revenue last year, and they bring in more revenue than their four closest competitors combined at a about a 50% YOY growth clip. Not bad. The question at this point is why? Why is AWS continuing to support such a massive market so well, while growing at such an astronomical pace? It’s because of three main tenants that permeate IT and virtualization management: performance, efficiency, and agility.

Three Crumbling Pillars of Virtualization

I would bet you dollars to donuts that any organization that leverages AWS regularly does so because its existing IT organization was not maximizing at least one—if not all three—of these critical objectives. Let’s dive deeper and again ask, why? The fact of the matter is that IT as an industry was established on a break-fix premise. Consider this loop:

  aws performance management

This marvelous illustration shows the perpetual loop that IT operations embraces—and has for the last fifteen or so years. Ask any IT admin how many alerts or tickets he or she receives daily or weekly. I’d guess you get a chuckle followed by some inconclusive answer. We as IT professionals are conditioned to want more of what we have; more visibility, more data, more reports, more alerts. One root of the performance management problem is the inability for human resources to scale: there is simply too much data and too many alerts for humans to reasonably address. In other words the problem is too complex. The result? We only address the most critical alerts.

Visibility + Alerts ≠ Performance

Business cannot perform this way and management recognizes it. Imagine if the airline industry ignored all of the opportunities to make minor adjustments to ensure a smooth flight and, instead, waited for systems to fail and the plane to be in freefall to actually begin addressing the outstanding issues. I’m not getting on that flight. So why do we operate infrastructures this way? Firefighting (only) the most critical issues, after performance has been degraded, does not allow IT to effectively deliver the service businesses expect—and they certainly can’t be efficient in doing so. As a result of this reactive nature, the frustration it generally fosters within organizations, and its impairment to corporate agility, many businesses have turned to AWS as a resolution. It provides an apparently easy and consistent solution to a problem perceived (most often) by those outside of IT.

Money is not the Answer (Ask Your CFO)

However, those in IT often believe they have the answer: rather than throwing money at AWS, perhaps we can throw (a bit less) money into our own data center and increase our resources. The common misconception is that more resources solve most if not all performance issues. The vast majority of the time, though, that’s not necessarily the case. An overprovisioned infrastructure is not guaranteed to deliver service, especially if we continue to embrace this reactive management cycle. After all, even with an abundance of resources, what are we doing but waiting for something to break so we can go fix it? While your organization may not be forking over its dollars to AWS, it’s still overspending in the interest of performance and agility.

In the end, AWS is preying on the notion that maintaining an infrastructure that is both performant and efficient simultaneously is so elusive that it appears an impossible goal. Even if we were to achieve that nirvana it’s only for a fleeting moment. Until the next alert.

Sizing, Placement, and Capacity: Where, When, and Why?

Now what if we flip the script? What if we Think Different? Virtual workloads (e.g., servers and desktops) deliver service to the business, so we should ultimately be most concerned with those. Now let’s make sure they are sized and placed the right way, in the right place, at the right time so that they can deliver service. That implies that they can readily consume the resources from the underlying infrastructure. But we already outlined how complex that problem is, so how can this be achieved consistently?

Software. In an industry that is literally defined by computers and software, we still rely on error-prone humans to attempt to make real-time decisions with thousands of dynamic data points. (Case and point: 4 typos in that sentence alone—corrected by spellcheck.) Instead, we can leverage a combination of human ingenuity (the aforementioned workload-first approach) and scalable software that continually executes preventative decisions to enable data centers to be kept healthy rather than be designed to break.

A Healthy Infrastructure…

Step back and think about it. The business you serve as an IT operator doesn’t tolerate outages and latency, so why do we? We should be operating under the presumption that our infrastructures can be kept perpetually healthy and in a “good state.” If we can achieve this, then we can become more efficient as well, supporting a higher number of workloads on the same (or less) hardware. Now we are maximizing performance and efficiency of our infrastructure, the performance and efficiency of our people, and enabling our business to increase its agility and, in turn, its competitive advantage.

…Maximizes Performance and Efficiency

AWS is relying on IT professionals to turn their palms up when management asks them to “do more with less,” to increase SLAs and reduce downtime, or when devs don’t feel as though their needs are being met or prioritized. And, yes the public cloud provides a good stopgap solution when additional resources are truly needed on-demand or for a brief period of time (e.g., retail businesses that see 5x traffic during the six weeks from Thanksgiving to Christmas). However, IT doesn’t need to rely on these platforms as a long-term solution to the performance-efficiency-agility problem that the industry almost unanimously faces. Organizations have invested millions in staff, hardware, and real estate so their IT departments can serve the business. The issue at hand is, what platforms can you leverage to make better use of these investments and enable internal IT to be performant, efficient, and agile?  When organizations like this spend on AWS, it’s not an investment but a parachute—and a costly one at that. These organizations may not need the public cloud. These organizations need control for their own virtualized data center.