May 2017
Beginner to intermediate
596 pages
15h 2m
English
Data lake ingests data in raw format. However, for meaningful analytics on the data, this raw data needs to undergo a certain level of processing. As we have seen in previous section, there are certain zones in the storage and data processing that make sure to convert the raw data to something more useful for various analytics. While this can be done with a series of map-reduce jobs, the main challenge is orchestrating these jobs across these zones and at a scheduled interval or at triggers. Also, since the Hadoop landscape comprises multiple technologies and frameworks, there could be different types of tasks to be executed from one data zone to another. We will cover the mechanisms to achieve this using oozie framework ...