11. Using Apache Hadoop YARN Distributed-Shell

The Hadoop YARN project includes the Distributed-Shell application, which is an example of a non-MapReduce application built on top of YARN. Distributed-Shell is a simple mechanism for running shell commands and scripts in containers on multiple nodes in a Hadoop cluster. There are multiple existing implementations of a distributed shell that administrators typically use to manage a cluster of machines, and this application is a way to demonstrate how such a utility can be implemented on top of YARN.

More than just providing a parallel execution application, Distributed-Shell can be used as a starting point for exploring and building Hadoop YARN applications. This chapter is intended to be a guide ...

Get Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.