Have you ever talked about a technology and then been told by someone "I hear it doesn't scale"? In the industry we have incredible growth in the number of products and technologies. Choices are wide with every layer in the data center stack, so when competitive conversations begin about products, scale is often a bargaining point. Even before it becomes a sales question, the research process leads us to a lot of conversation online and elsewhere about "Does it scale?"
The real question about scale comes when we look at our current environment and growth up to this point. Has scale been an issue? Well, that may or may not be a question you can answer, because we have been growing both vertically and horizontally in many ways in the years leading up to now.
Understanding Scale Load Versus Scale Design
Will your IT organization be pushing the limits of your infrastructure? Most likely, this is not going to be the real issue. Application performance issues due to design challenges are often the culprit, along with a number of potential infrastructure issues. There are challenges in identifying issues between infrastructure performance and infrastructure architecture, and quite often the line is blurred which creates the confusion.
OpenStack Neutron has been talked about as having scale issues in every release since it came to be in it's original for as the previously named Quantum project. The reality with this was that there are issues around rapidly growing the infrastructure. The organizations that discovered this was the challenge early were often service providers. As you can imagine, a service provider will have a wholly different way to define "at scale" than most typical IT organizations would.
Docker is also a popular target when discussions around scale limits happen. Again, the challenge comes with being able to handle the massive growth of a Docker ecosystem within your environment, and much less about the logical limits of Docker which can theoretically scale to many hundreds of thousands of containers. East-West networking is one particular challenge, but placement and continuous awareness of the environment is the real challenge you will have long before you run beyond the technical capabilities of Docker.
Everything Will Fail At Scale
There was a very interesting article on ZDNet that featured Google VP of Engineering Urs Hölzle. It touched on the complications of dealing with a massively scaled environment such as Google Gmail and other Google services. What I try to take away from reading this article is that no matter where you scale to, you will be at some sort of risk.
With risk, comes the ability to mitigate risk, or at least it should. This is why BCP/DR environments are able to be designed with a certain set of assumptions. We wouldn't assume that every localized failure will trigger a total data center loss, so we have intermediate levels of recovery that are for specific systems and applications.
BCP aside, your growth is going to hit the point where you may reach a scale challenge. What is interesting is that while people are focused on where the scale limits of their current tools and infrastructure are, the are often forgetting to look inwards at what is already happening to the application performance at the current size. The first question that we have to have, which are whether the product is hitting the limit, or the way that it has been architected is the limit.