Back to Blog

Ryan Strehlke

To Scale Up or Scale Out? That is the question

To Scale Up vs Scale Out is an Important Question at Every Level of the IT Stack.

The demands placed on virtual data centers are not simply ever-fluctuating, but also ever-growing. As business critical workloads multiply and vendor requirements continue to trend upward, applications are consuming more and more resources; resources that may or may not be readily available.

If we don’t have the resources, we must make a decision between either scaling up or scaling out. Is it best to increase the size of the instance that runs a particular workload, or spin up a new instance entirely? Often times this is a complex decision to make, and it’s one that applies to every level of the IT stack. As VMware author Frank Denneman explains, “Scalability applies not only to capacity and performance, but also the financial aspect and how well the architecture can deal with change over time.”

For example, what if we needed additional computing power within the data center? Scaling out would mean buying an additional server to spread application processing load, while scaling up would be upgrading the current hardware components, or replacing the server itself with a beefier, more expensive version.

visualization of the difference between scale up vs scale out

I’ll ask again; which is the correct choice? Well, like many decisions in the data center, it depends. Adding an entirely new server, as opposed to swapping one out for another that’s more powerful, will take up more space, and result in higher cooling and electricity costs. However, it will provide more redundancy, and therefore fewer lost workloads if a server were to fail. Replacing vs upgrading a single host may not make that much of a difference, however on a large scale, the cost and performance differences would be much more substantial.

What if we go even further down the stack and apply the same dilemma to storage, the foundation of the infrastructure. Regardless of how much compute is allocated, without open access to storage resources, such as IOPS and disk availability, workloads won’t be able to perform. So, it’s critical to prevent bottlenecks from occurring, and not simply waiting for, and reacting to resource contention points. Unfortunately, without letting software control software, these bottlenecks are difficult enough to identify, let alone prevent.

If we look at the potential storage contention points below, the complexity behind these bottlenecks becomes apparent.

scale up vs scale out - storage contention points

Contention points can exist at any level of the storage infrastructure – the disk pool, controller or datastore – and spread latency upward to the VMs themselves. Now, if we think about scaling storage, do we scale up or out? Is it better to increase the size of the VMDK, or create a separate drive? How about carving out a new aggregate versus sizing one up? Well it completely depends on the scenario. More importantly, it depends on the demand of the workloads themselves. If we have a workload with a peak IOPS demand of 3000, then cloning an instance of storage that only has 2500 available IOPS wouldn’t make much sense. However, that may be the right decision to handle multiple, less intensive workloads. Without a software controlled solution, how would we even know where to begin?

Lastly, we can look at the question of scale up versus scale out at the VM and application levels. Let’s take a workload that is oftentimes difficult to manage, Microsoft SQL Server. If your relational database grows to the point where its performance deteriorates, based on compute resources that have been allocated to its VM, it’s time to make a decision. Do we add virtual resources to the VM itself, or do we spin up an entirely new SQL instance all together? We may incur additional licensing costs from scaling out, but in turn, could witness better performance. Likewise, managing an entirely new SQL instance manually would be more difficult, but it would eliminate a single point of failure. Without comprehensively analyzing the real-time demand of this SQL application, and conducting cost-benefit analysis manually, how could we make an accurate decision ourselves?

So when it comes to scaling up or scaling out, the decision is yours… but do you really want it to be? Wouldn’t it be easier to let software make the scale up vs scale out choice make the scale up vs scale out choice for you?