May 2017
Beginner to intermediate
596 pages
15h 2m
English
This figure shows where we have reached with our Data Lake after covering part 2 of this book:
Figure 01: Data Lake implemented so far in this book
|
HDFS |
Distributed File Storage |
|
MapReduce |
Batch Processing Engine |
|
YARN |
Resource Negotiator |
|
HBase |
Columnar and Key Value NoSQL database that runs on HDFS |
|
Hive |
Query engine that provides SQL like access to HDFS |
|
Impala |
Fast Query Engine for analytical queries on HDFS |
|
Sqoop |
Data Acquisition and Ingestion |
|
Flume |
Data Acquisition and Ingestion via streamed flume events |
|
Kafka |
Highly Scalable Distributed Messaging Engine ... |