April 2018
Beginner to intermediate
440 pages
11h 36m
English
Hadoop is an open-source framework for working with large quantities of data spread across a single computer to thousands of computers. Hadoop is composed of four modules:
The Hadoop Core makes up the components needed to run the other three modules. HDFS is a Java-based file system that has been designed to be distributed and is capable of storing large files across many machines. By large files, we are talking terabytes. YARN manages the resources and scheduling in your Hadoop framework. The MapReduce engine allows you to process data in parallel.
There are several other projects that can be installed to work with the Hadoop ...