Chapter 6. Oozie Coordinator

In the previous two chapters, we covered the Oozie workflow in great detail. In addition to the workflow, Oozie supports another abstraction called the coordinator that schedules and executes the workflow based on triggers. We briefly introduced the coordinator in Chapter 2. In this chapter, we will cover the various aspects of the Oozie coordinator in a comprehensive fashion using real-life use cases. We present multiple scenarios to demonstrate how the Oozie coordinator can be utilized to trigger workflows based on time. We also describe the various operational knobs that the coordinator provides to control the execution of the workflow. We will get into the data availability–based workflow trigger in Chapter 7.

Coordinator Concept

As described in Chapter 5, an Oozie workflow can be invoked manually and on demand using the Oozie command-line interface (CLI). This is sufficient for a few basic use cases. However, for most of the practical use cases, this is inadequate and very difficult to manage. For instance, consider a scenario where a workflow needs to be started based on some external trigger or condition. In other words, as soon as some predefined condition or predicate is satisfied, the corresponding workflow should be executed. For example, we could have a requirement to run the workflow every day at 2 a.m. It is very hard to achieve this behavior using just the CLI and basic scripting. There are two main reasons for this:

  • The specification of ...

Get Apache Oozie now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.