Chapter 2. Scheduling in Distributed Systems
Introduction
In distributed computing, a scheduler is responsible for managing incoming container requests and determining which containers to run next, on which node to run them, and how many containers to run in parallel on the node. (Container is a general term for individual parts of a job; some systems use other terms such as task to refer to a container.) Schedulers range in complexity, with the simplest having a straightforward first-in–first-out (FIFO) policy. Different schedulers place more or less importance on various (often conflicting) goals, such as the following:
Utilizing cluster resources as fully as possible
Giving each user and group fair access to the cluster
Ensuring that high-priority or latency-sensitive jobs complete on time
Multi-tenant distributed systems generally prioritize fairness among users and groups over optimal packing and maximal resource usage; without fairness, users would be likely to maximize their own access to the cluster without regard to others’ needs. Also, different groups and business units would be inclined to run their own smaller, less efficient cluster to ensure access for their users.
In the context of Hadoop, one of two schedulers is most commonly used: the capacity scheduler and the fair scheduler. Historically, each scheduler was written as an extension of the simple FIFO scheduler, and initially each had a different goal, as their names indicate. Over time, the two schedulers ...
Get Effective Multi-Tenant Distributed Systems now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.