O'Reilly logo

Hadoop MapReduce v2 Cookbook - Second Edition by Thilina Gunarathne

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Adding dependencies between MapReduce jobs

Often we require multiple MapReduce applications to be executed in a workflow-like manner to achieve our objective. Hadoop ControlledJob and JobControl classes provide a mechanism to execute a simple workflow graph of MapReduce jobs by specifying the dependencies between them.

In this recipe, we execute the log-grep MapReduce computation followed by the log-analysis MapReduce computation on an HTTP server log dataset. The log-grep computation filters the input data based on a regular expression. The log-analysis computation analyses the filtered data. Hence, the log-analysis computation is dependent on the log-grep computation. We use the ControlledJob class to express this dependency and use the JobControl ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required