You use containers for your daily development and daily operations or heard of the many container cluster solutions provided by public clouds. You like the idea of running containers directly instead of virtual machines.
You can’t use public clouds because:
- Moving all your data over internet to the cloud is inefficient
- Your data is too sensitive to put on public cloud
- They’re just too expensive for your scale
- You have a specific need like storing huge amounts of video
The hypothetical list goes on. Bottom line is you need something on-premise.
For a question containing both on-premise and cloud in the same sentence, the state-of-the-art industry answer is Hyper-Converged Infrastructure (HCI). Our solution has the same logic in providing compute, storage and networking on the same hardware but differs in means to do that.
Since the requirements was specific in this case, we could design the whole stack from scratch instead of just buying something with generic characteristics. And it begins with hardware. We based our solution on Supermicro servers and with fantastic technical guidance from their experts, we customized them according to the needs of this solution.
Second part covers deploying and testing the hardware, considering replacements at each step and finally installing the operating systems. MAAS is a very helpful tool here, handling the identification of servers, running the necessary tests, reporting the results and finally installing the operating systems. Needless to say, we designed dedicated networks for IPMI management and PXE boot so all of those operations could be done remotely during initial deployment as well as when the system needs expansion.
Third part is deploying Kubernetes. While there are many ways of doing that, both open-source and proprietary, most of them target public clouds like AWS. Kubespray, supported by the core project, is probably the most flexible way of installing a production grade kubernetes on bare-metal. It handles the generic parts of the installation pretty fine, but we’re in a custom environment with customized servers and we need the applications within the cluster to interact with the outside. So we added the crucial parts like apiserver load-balancing or hardware monitoring, and of course…
It’s a huge topic in Kubernetes with lots of different options but it all comes down to the type of the environment the cluster is running in. If it’s AWS and you’re able to use virtual routers when necessary, that’s one type. Ours is a L2 VLAN and a single gateway and all pods needed to be accessible from outside the cluster directly. So we had to choose the most basic one that is linux bridges. It’s a bit unconventional, and has some limits, yes, but sticking with basics for the most critical part of the cluster has its benefit of being solid when you need it.
The last missing part is a place for applications to store their data. It is without a doubt the most complex and hard to get right part of any virtualized infrastructure. That’s probably why most of the HCI vendors are actually storage vendors. For this solution, we are using ceph. Kubernetes persistent volumes are backed with block devices from ceph called RBDs and are accessed using the kernel driver. To manage the complexity of deploying ceph, we used rook. That means the storage itself is actually a resource in the kubernetes cluster and it can be managed as such.
Now deployed in production, all that is left for the thousands of applications running on this solution is to define their cpu, memory and storage requirements and they will run somewhere on the cluster spanning hundreds of nodes containing thousands of drives. All with open-source technologies on standard hardware blended with years of experience in running clusters.