May 2017
Beginner to intermediate
596 pages
15h 2m
English
Apache Oozie is an open source Java-based web application used for pipeline creation, and it is well integrated with the Hadoop stack.
Oozie can be used to schedule and run Oozie jobs in a Hadoop cluster. It can combine small jobs into more complex ones and can do this according to the pipeline configured to achieve the required use case. Oozie triggers the configured workflow and leverages the Hadoop engine to execute the individual jobs in the workflow.
Job completion of Oozie tasks is detected by two mechanisms, namely, callback and polling. When a job is configured, a callback URL can be configured, which is invoked when the job is completed.
This figure shows the basic working of Oozie: