Skip to Content
Data Lake for Enterprises
book

Data Lake for Enterprises

by Vivek Mishra, Tomcy John, Pankaj Misra
May 2017
Beginner to intermediate
596 pages
15h 2m
English
Packt Publishing
Content preview from Data Lake for Enterprises

Apache Oozie

Apache Oozie is an open source Java-based web application used for pipeline creation, and it is well integrated with the Hadoop stack.

Oozie can be used to schedule and run Oozie jobs in a Hadoop cluster. It can combine small jobs into more complex ones and can do this according to the pipeline configured to achieve the required use case. Oozie triggers the configured workflow and leverages the Hadoop engine to execute the individual jobs in the workflow.

Job completion of Oozie tasks is detected by two mechanisms, namely, callback and polling. When a job is configured, a callback URL can be configured, which is invoked when the job is completed.

This figure shows the basic working of Oozie:

Figure 16: Basic working of Oozie ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

The Enterprise Big Data Lake

The Enterprise Big Data Lake

Alex Gorelik
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Lakes

Data Lakes

Anne Laurent, Dominique Laurent, Cédrine Madera

Publisher Resources

ISBN: 9781787281349Supplemental Content