Scheduling and Workflow

When you’re working with big data in a distributed, parallel processing environment like Hadoop, job scheduling and workflow management are vital for efficient operation. Schedulers enable you to share resources at a job level within Hadoop; in the first half of this chapter, I use practical examples to guide you in installing, configuring, and using the Fair and Capacity schedulers for Hadoop V1 and V2. Additionally, at a higher level, workflow tools enable you to manage the relationships between jobs. For instance, a workflow might include jobs that source, clean, process, and output a data source. Each job ...

Get Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.