In This Chapter
Examining distributed data processing in Hadoop
Looking at MapReduce execution
Venturing into YARN architecture
Anticipating future directions for data processing on Hadoop
My, how time flies. If we had written this book a year (well, a few months) earlier, this chapter on data processing would have talked only about MapReduce, for the simple reason that MapReduce was then the only way to process data in Hadoop. With the release of Hadoop 2, however, YARN was introduced, ushering in a whole new world of data processing opportunities.
YARN stands for Yet Another Resource Negotiator — a rather modest label considering its key role in the Hadoop ecosystem. (The Yet Another label is a long-running gag in computer science that celebrates programmers’ propensity to be lazy about feature names.) A (Hadoop-centric) thumbnail sketch would describe YARN as a tool that enables other data processing frameworks to run on Hadoop. A more substantive take on YARN would describe it as a general-purpose resource ...