The patches for the recent Meltdown and Spectre vulnerabilities have been released, or are pending, from every major operating system and virtualization vendor as I write this (Forbes list of active patches: Here Are All The Available Fixes You Need For Those Huge Chip Hacks). We have seen the launch of the information about the exploits recently as was covered here (Preparing for the Uncertainty and Risk of Meltdown and Spectre ) and the details are becoming more clear on the effects of the patching results. There are many reports from cloud and on-premises consumers who are seeing measurable impact after the resulting patches. There are no common or average metrics, but the reports are that it is a real and impactful reduction in performance.
We’ve seen CPU usage go from ~20% to ~40% (and now critical machines with redundancy upscale under loads that before didnt made them blink). Costs this month in AWS will go up 10%, I predict (very least, haven’t checked EMR effect yet, if similar, 20-30%) #spectre #meltdown #fb
— Ruben Berenguel, PhD (@berenguel) January 6, 2018
Lots of articles are coming out from various organizations (Meltdown and Spectre: How chip hacks work ) with details on the issue. Vendors are sharing details and results with the community on how they are dealing with potential issues (How Red Hat is Dealing with Spectre CPU Meltdown ) which is validating many early assumptions about the potential performance impact of patches against the exploit. There is no need to do a deep dive into the technical aspect or security aspects here. Many great bloggers have already covered that, so I want to cover what the value of Turbonomic is as we head into the next steps with vulnerability patching.
What we do know for a fact is that the net effect as a result of the patching may affect the ability of CPU architectures to handle threads, which will affect CPU queuing, and thus, workload performance. To what level, that is to be proven on a workload-by-workload basis.
Preparing for the Impact
The Turbonomic team has been dealing with performance improvements in light of infrastructure challenges since the launch of our company. The Meltdown and Spectre patch may not produce a specific linear reduction in performance, but the impact can be modelled provided we use some of the published numbers or even as a worst-case scenario. Turbonomic can help you plan how to respond to the potential performance impact.
Turbonomic includes modelling and planning features within your environment to accurately measure the impact of changes to utilization and performance, including how to optimize in light of those changes.
Using the out-of-the-box planning features, creating a custom plan allows you to add a simulated percentage of load to the environment. You can choose to assess this on any subset of the environment as well, so it may be able to be done at the host, cluster, data center, or any scope within the environment.
Selecting a conservative percentage may still produce results which may surprise you with the impact it can have on the environment.
In order to simulate no sizing/scaling changes to the current environment by Turbonomic, you can also adjust the Automation level in the plan scenario which will limit the proposed changes in the plan results.
Even with a seemingly small 7% increase to the load on this demo environment, the impact triggered a need to add an additional host.
That is just one planning scenario. We also need to remember to leverage the built-in capabilities of Turbonomic to optimize the environment for Day 2 and beyond.
Optimizing and Managing the Real-Time Environment
Once you know the impact, Turbonomic can help you mitigate some of it. Already a Turbonomic customer? You should consider doing the following:
- Take advantage of auto-placement – Let Turbonomic choose the right place to run your workloads in order to meet the real-time demand, including host-level CPU queueing challenges and much, much more
- Take advantage of auto-scaling – Scale up, down, or out, depending on the workload profile, all powered by the intelligent decision-engine
- Create superclusters – Leverage the hardware pool by expanding workload reach across clusters to better utilize the underlying infrastructure
- Reduce performance bottlenecks in other parts of the environment – storage latency, network latency, application QoS and much more can be proactively affected by Turbonomic to deliver better workload and infrastructure performance
- Ensure the right cloud templates are being used – As public cloud infrastructure changes, Turbonomic selects the best template to find the right equilibrium between performance and cost
- Automate at any scope of the environment – choose your level of automation (placement, scaling, provisioning, etc.) on any scope of the environment (app, vm, host, cluster, data center, cloud region, etc.)
By taking actions on-premises like scaling, sizing, starting, and placing VMs, instances, and containers, to most effectively utilize the underlying host and cloud infrastructure, you will see real-time improvements occurring throughout the environment.
Public clouds become even more challenging as you must choose templates to map to the performance profile of your workloads in a continuously fluctuating and opaque environment. Turbonomic continuously optimizes the real-time environment and will model your workload deployments and migration based on the actual performance and cost requirements.
Using the same model as we did above, but with allowing Turbonomic to enable dynamic resizing of workloads, the environment can absorb the same 7% increase in load without any additional hosts.
Only upon more than doubling the load increase to 15% triggered an action to provision a new host, thanks to the dynamic resizing to satisfy the demand of the environment. This gives you an example of how Turbonomic can help you, both in planning, and in continuously optimizing the health of your environment through trustworthy decisions and built-in automation.
As we all look ahead to what we face with remediating the Meltdown and Spectre vulnerabilities, it provides some food for thought on the importance of systems thinking for both centrally patching to ensure safety, and for optimization. To download Turbonomic for your environment today, go to the download site ( http://turbonomic.com/download ) and begin your free 30-day trial today.