Turbonomic Blog

Vertical Scaling in a Horizontal World

Posted by Ben Yemini on Sep 30, 2019 11:00:00 AM
Ben Yemini
Find me on:

One of the key concepts in Jim Collin’s book ‘Built to Last’ is that important decisions are often not binary. For example, you don’t have to decide between being disciplined or creative. You also don’t have to decide between empirical analysis or decisive actions. The same notion should be applied to designing and building application environments that are built to last.

To follow Jim’s concept around the tyranny of the “Or” and the genius of the “And” one of the flawed “Or” debates in architecting and managing modern applications is that you need to scale vertically or horizontally.

Some would say that you only need to scale horizontally, and vertical scaling is a monolithic approach (at some point monolithic became a four-letter word). Now don’t get me wrong, horizontal scaling is fundamental to building modern applications. And how you manage horizontal scaling and balance the load becomes a key consideration after you’ve broken an application into its services and leveraged an approach that allows you to manage those services.

On the other hand, queuing theory suggests that vertical scaling is best for performance optimization, i.e. if you can serve the demand effectively with one line you don’t have to pay for the performance penalty and delay of splitting the demand (aka routing it) to multiple lines. As an aside, the same theory has also been proven in grocery stores and coffee shops.

So, should you scale vertically or horizontally?

If you read this far you know the answer is that you have to do both. It’s not an OR debate it’s an AND. To build an application environment that lasts you need to design for both.

With Turbonomic 6.3 we added Consistent Scaling. A feature designed to drive the best vertical scaling decisions for a group of VMs that need to be sized the same. This is typically a requirement for a horizontal scaling application managed with AWS’s Autoscaling Groups or for a high availability architecture leveraging Azure Availability Sets. With our 6.4 release we’ve extended this functionality to Azure Scale Sets as well as containers.

With Consistent Scaling enabled for a group of entities – VMs or containers, off or on-prem – Turbonomic resizes all of the group members to the same size. All VMs that are part of Azure Availability Sets, Scale Sets and AWS Autoscaling groups are automatically represented as a group in Turbonomic. Consistent scaling is enabled for those groups by default. You can also turn it off in the policy tab.

Now how can you leverage vertical scaling decisions to design an environment that scales both vertically and horizontally?

Let’s look at an example for AWS Autoscaling groups (ASGs). In ASGs you define the size of an EC2 instance through a launch config. The ASG process then ensures all EC2 instances that are part of the group are launched using the specified size. You need to specify the min, max and desired number of EC2 instance in the group. And if you want to leverage horizontal scaling typically based on a target resource utilization or keep the group size as is. ASGs also provide health-checks that automatically discover unhealthy instances, terminate them and launch a replacement instance to ensure high availability.

What ASGs lack is a way to determine what is the correct size to specify for the EC2 instances. A decision that could create performance bottlenecks or waste money if you overprovision or have unused RI inventory. Turbonomic’s analytics ensure you get the best performance for the app running on the group of entities at the lowest possible cost.

So how does that Turbonomic work in concert with AWS autoscaling to provide vertical and horizontal scaling?

Once you leverage Turbonomic’s decision for the correct EC2 instance size there are different approaches to propagating a vertical resize to a group of multiple EC2 instance. Orchestrating the change should be done in alignment with of how you manage ASGs and the level of disruption you can tolerate. Here is an example for zero downtime leveraging terraforms.

Below is an event-based approach which increases the size of the group by one to minimize downtime and then leverages the ASG health check processes to terminate one instance at a time and launch a new one that conforms to the desired size.

With this approach the following workflow transpires.


Turbonomic scaling action updates the ASG launch configuration. The change to the ASG triggers a CloudWatch event which calls a Lambda function. The Lambda function executes a python script which works in conjunction with the ASG processes (e.g. health-checks, termination, launch) to conform every member of the group to the desired size. You can access the script with more details on set up here. If you’ve got suggestions for improvement let us know.

Hopefully at this point you’re convinced that vertical or horizontal scaling is the wrong debate. You need both.


Subscribe Here!

Recent Posts