I showed you one example of the use of the Oozie coordinator, which offers cron-like capabilities to launch periodic Oozie workflows. The Oozie coordinator can also be used to trigger a workflow based on data availability (if no data is available, the workflow isn’t triggered). For example, if you had an external process, or even MapReduce generating data on a regular basis, you could use Oozie’s data-driven coordinator to trigger a workflow, which could aggregate or process that data.

In this section, we covered three automated mechanisms that can be used for data ingress purposes. The first technique covered Flume, a powerful tool for shipping your log data into Hadoop, and the second technique looked at the HDFS File Slurper, which ...

Get Hadoop in Practice, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.