May 2017
Beginner to intermediate
596 pages
15h 2m
English
Data processing is one of the important capabilities in a Data Lake implementation. Our Data Lake is no exception and does participate in data processing, both in batch and speed layer. In this section we will cover some important topics that needs to be looked upon with respect to Data Lake dealing with data processing. With Hadoop 1.x, MapReduce was one of the main processing done in Hadoop. With Hadoop 2.x and with more data ingestion methodologies, more options in the real time/streaming area have also come in and these two aspects with some important considerations are detailed here.