Hadoop is a venerable technology now; the grand old man of distributed computing technologies. We won't spend too much time dwelling on Hadoop's internals, but a brief introduction is required for this chapter for it to make sense to folks who are not from a big-data background:
The MapReduce programming paradigm is what really matters to a user. It defines a map and reduces tasks using the MapReduce API, and submits them to that part of the Hadoop ecosystem:
When a job gets triggered on the corresponding cluster, this brings ...