This post was originally published here.
The community has been on a Kubernetes multi-cluster journey since 2015 and we’ve learned a lot along the way. As we approach KubeCon North America in San Diego, I’d like to reflect on how we got to where we are today, its origins and what the SIG is aiming for.
A short note on Kubernetes
Kubernetes started as a container orchestrator, building upon the fast-gained popularity of Docker, the containerization tool responsible for making it super easy and thus popular to build, run, and distribute containers. Other similar open source orchestration projects started around the same time as Kubernetes, Cloud Foundry and Mesos being the most notable ones. One can argue that the other projects had varying goals, either focused on or should be placed in a different space. For example, that Mesos is a generic scheduler. Nevertheless, I think they have always been direct competitors.
Kubernetes evolved as the clear winner, becoming not only one of the most successful open source projects of all times, but also the de-facto container orchestration platform across the industry. Quoting the kubernetes.iodocumentation it has precisely evolved into a “system for automating deployment, scaling and management of containerized applications,” being adopted by all major cloud vendors who also offer additional features and functionalities around their managed offerings.
Kubernetes works around the concept of a cluster, a pool of resources grouped together via virtual machines or physical machines, aka nodes/hosts, which can run containerized workloads. It lets you manage this pool of resources or the cluster efficiently and easily, while also defining and providing an easy mechanism to deploy and manage the lifecycle of your containerized application on this pool. It additionally defines the mechanisms of networking, storage, scaling, upgrading and discovering the deployed applications. Under the hood it can orchestrate replicas of the same application across multiple nodes/hosts while allowing the same nodes/hosts also to be shared by different applications.
Going beyond a single cluster
This blog however is about multiple Kubernetes clusters. When is a single cluster not sufficient? Why would one even need multiple clusters? The original paperintroducing the concept of a uniform multi-cluster control plane aka “Ubernetes” did a very good job at highlighting the major reasons for multiple clusters. Those reasons still hold true today. The next section will list most of the use cases that are relevant for Kubernetes, building on those listed in this paper.
The use cases for multi-cluster Kubernetes
Kubernetes offers features which can provide some level of multi-tenancy, but Kubernetes is not truly multi-tenant. This is partly due to the fact that containerization does not offer perfect isolation between co-located applications. Many real-world use cases need to consider the security implications of sharing resources between multiple tenants, which in most cases leads to not sharing the same resources. Instead, tenants are either partitioned by cluster nodes, or get different clusters altogether. Many large enterprises more often than not choose to run the latter, i.e. different clusters for different tenants. The same is true for purpose. It’s fairly common to hear “dev clusters,” “test clusters,” “prod clusters,” etc. Isn’t it? One can argue that multi-tenancy is not a reason to run multiple clusters, but I have seen this as one of the most common reasons enterprises do it.
Capacity overflow/Cloud bursting
Cloud bursting is probably one of the most useful and money-centric multicloud user scenarios. Use the more expensive public cloud resources only when the on-premises resources are not sufficient. The concept in fact predates Kubernetes and many enterprises implemented it in their own ways. Before PaaS software gained widespread usage, IaaS software had solutions for the concept. One example in open source was Jacket for OpenStack. For Kubernetes there is no clear open source solution, however vendors providing managed Kubernetes service might provide a facility to achieve the same.
You can learn more about cloud bursting in Kubernetes from this talk at KubeCon Seattle last year.
A single cluster provides reasonable fault tolerance for simple containers and can take care of containers dying or not responding. However a single cluster cannot withstand network outages, which are fairly common, data center problems, natural calamities, etc. It’s reasonably common for applications to span multiple clusters into (their own) different data centers, different vendors and different geographies for the purposes of high availability.
The need to be compliant with local laws and security policies is a common use case, especially in the banking and telecom sector. Generally, a single cluster can’t be compliant with each and every rule everywhere. Having dedicated clusters audited for specific requirements is usually the solution. Enterprises also need to ensure that the applications are audited, marked for compliance, and do run in the clusters labelled suitable for the same.
Although public internet is fast, it is still limited by physics. This is the very reason organizations at times choose to host applications in the region in which users are using it. Different clusters in different geographies cater to this specific case.
Vendor lock in avoidance
There are a lot of cloud vendors out there and all offerings are obviously not created equal. Most larger sized cloud users do not stick to a single vendor. Because the pricing models and offerings change, users change their usage models over time too. Kubernetes enables the applications to be cloud agnostic, while having multiple clusters is a solution for when there is a need to use multiple vendors or move applications from one cloud to another over time.
The Evolution of the Multicluster SIG in Kubernetes
Ubernetes, To Be or Not to Be
Multicluster discussions in the Kubernetes community started as early as the beginning of 2015. At the same time, Kubernetes itself was in its initial stages of development. Some founding members of Kubernetes considered it worthwhile to kickstart the discussions and ensure that the nuances of multicluster get baked inherently into the API from the start. This presentation is one of the first talks discussing multiple clusters that was presented at one of the early KubeCons. The initial idea was focused on a single solution for all the possible Kubernetes multi-cluster problems aka “Ubernetes.” Most of the stakeholders thought that it would be possible to build a control plane akin to the Kubernetes control plane, which could cater to all multicloud use cases one could think of.
That was not to be. The problems at hand proved to be incredibly difficult to solve collectively. Additionally, in the following years Kubernetes itself also grew into a behemoth where there had to be special efforts to break down the whole monolith into smaller co-working pieces. Some examples of this can be found here and here.
From Ubernetes to Federation
Subsequent efforts had to narrow down the multicluster problems iterating over multiple stages. The first efforts converged the Ubernetes definition to “Federation” as applying the concept of the Kubernetes API Server to multiple clusters. It considered clusters akin to nodes and the Federation control plane akin to the Kubernetes control plane. This meant using the Kubernetes resource API also as the Federation API, simply going a level up in abstraction.
This view came with its own challenges, especially because there was no clear API space to specify and validate properties which were specific to multiple clusters. They all lived in general purpose annotations. For a brief example, this approach provided a Federation API server as a simple client to multiple Kubernetes API servers, exposing the same Kubernetes resource API. So a Federation user could create the same resource against a Federation API server, for example, a Deployment or a ConfigMap he/she would create against a Kuberntes API server. It looked promising for its simplicity as the same clients and client tools could work against Federation API server. In fact, it looked quite elegant at the time and was ok until the prototype implementation of the concept. It could not, however, mature into a clear path for its evolution, while still keeping pace with how the Kubernetes API was evolving at the same time, while also being able to support the same API exposed as Federation API.
Federation could not simply reuse the the whole API. Questions like, “Where does the Federation specific API content live?” and “How does one give meaning to each kind of resource across multiple clusters?” became difficult to answer. Spreading the replicas evenly, for example, in the case of replication workloads, and creating the same resource in all clusters for example, in the case of a secret or configmap, was incoherent. In brief this section of SIG intro at Kubecon EU 2018 discusses some of these problems.
Subsequently, the SIGs focus was also changed to be more inclusive of other possible solutions and projects and the SIG Federation was changed to SIG Multicluster.
In Federation, additional efforts further narrow down the scope to apply only to API resource management, keeping the lower-level implementations, for example, multi-cluster network overlay or a multi-cluster storage completely out of scope. The goal was to be able to define an API and building blocks which can be independently developed, iterated over and extended for custom implementation. The community effort in its current shape lives at Kubernetes Cluster Federation aka KubeFed.
Although Kubernetes Cluster Federation has been the single biggest effort the Multicluster SIG has seen in Kubernetes, there have been other notable projects and efforts which were presented at Kubecons, discussed in the SIG, or simply lie in the same space. I will be discussing KubeFed in some detail and few of these notable efforts and projects in a subsequent article.
Where do we go from here?
The multi-cluster challenge for the Kubernetes community is a big open space and the problems to solve are complex. No wonder the original vision of Ubernetes as a single solution to all problems did not pan out well.
Another issue with this space has been that over the years it has not seen the kind of open source collaborators, sponsors to be exact, as it deserves. For example the biggest cloud vendors AWS and Azure are missing and Google’s participation is almost negligible as of now in the SIG. In my opinion this has partly been because of the huge monetary potential this space has.
The major vendors are probably seeing this huge opportunity to compete in the space, rather than collaborate and drive standard solutions.
Google’s path is a good example. It found better sense in approaching the whole space as a set of solutions. After the success of GKE, they started building GKE on-prem in-house, were key in making Istio extremely well adopted among the users, threw in Stackdriver, and came up with Google Cloud Interconnect. Their acquisition of Velostrata completed all the pieces in the puzzle with the overall set of solutions eventually branded as Anthos. The pricing of Anthos reflects Googles confidence in the monetizing potential of this space.
It is going to be extremely interesting to watch how the other cloud vendors come up with their own solutions to claim their share.
It is also going to be interesting to watch how the open source collaborations in SIG multicluster pan out in the coming months and what direction it takes. As talked about earlier, it’s difficult to solve everything at once. Some of our biggest efforts in the SIG right now are to be able to reach out to the community and users who use multiple Kubernetes clusters and focus on solving the most common problems first. We are also looking for collaborators and ideas which can bring in wider community contributions in the coming future.
For those of you going to KubeCon San Diego in November, I and a few other SIG folks will be presenting the SIG Multicluster Intro + Deep Dive at the show. We will give you an overview of the SIG’s current efforts, how best to get involved, and what our future plans look like.
Irfan is a senior engineer associated with the Advanced Engineering group at Turbonomic. In his current role he is tasked with drafting multi cluster capabilities for Turbonomic’s analytics platform. He has also been associated with SIG Multicluster, particularly Kubernetes Cluster Federation project (KubeFed) and strives to mature the same into useful software.