I am extremely happy that my exam session is almost over and it’s been a while since my last blog post. I take this as a good moment to write a new article about how Mesos works and give an insight about the current research efforts related to this topic.
At a top level, Mesos can be considered as a highly-available and fault-tolerant operating system kernel. Just like daemons or services run on top of an operating system, data processing frameworks are running on Mesos. The analogy is even stronger if you think on Mesos as a manner of abstracting CPU, memory, storage and other computational resources from a cluster away from this frameworks.
A good point to start reading more about Mesos is the article published for the NSDI conference Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center.
Each framework that runs on top Mesos can benefit from resource isolation and from what is called resource pooling. Resource isolation is done using containers such as cgroups or docker.
The principle of resource pooling is not only used for distributed systems, but also for networking protocols. For example, MultipathTCP, an extension of TCP, achieves this by transmitting data simultaneously on multiple subflows and shifting traffic from the congested paths to the less congested ones, using a coupled congestion control.
The allocation of resources is done by using two policies. One of them, Dominant Resource Fairness(more details in this paper Dominant Resource Fairness: Fair Allocation of Multiple Resource Types), seeks to maximize the minimum dominant share across all frameworks. For example, if framework A is CPU-intensive and framework B is memory intensive, the DRF algorithm attempts to equalize framework A’s share of CPU with framework B’s share of memory. The other one is implemented using strict priorities.
Fault tolerance and high availability is done by running multiple masters in a standby configuration using Zookeeper, a framework that does the leader election whenever the active master fails. The communication interface between frameworks and Mesos is composed from a scheduler which plans the execution of the framework’s tasks and an executor which runs on slave nodes and runs the tasks. The figure below better summarizes this:
I hope you found this interesting and thanks for reading ! 😀