book

Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

Name: Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2
ISBN: 9780133441925

by Arun C. Murthy, Vinod Kumar Vavilapalli, Doug Eadline, Joseph Niemiec, Jeff Markham

March 2014

Intermediate to advanced

400 pages

10h 7m

English

Addison-Wesley Professional

Read now

Unlock full access

About This eBook
Title Page
Copyright Page
Contents
Foreword by Raymie Stata
Foreword by Paul Dix
Preface
Focus of the BookBook StructureBook ConventionsAdditional Content and Accompanying Code
Acknowledgments
About the Authors
1. Apache Hadoop YARN: A Brief History and Rationale
IntroductionApache HadoopPhase 0: The Era of Ad Hoc ClustersPhase 1: Hadoop on DemandPhase 2: Dawn of the Shared Compute ClustersPhase 3: Emergence of YARNConclusion

2. Apache Hadoop YARN Install Quick Start
Getting StartedSteps to Configure a Single-Node YARN ClusterRun Sample MapReduce ExamplesWrap-up
3. Apache Hadoop YARN Core Concepts
Beyond MapReduceApache Hadoop MapReduceApache Hadoop YARNYARN ComponentsWrap-up
4. Functional Overview of YARN Components
Architecture OverviewResourceManagerYARN Scheduling ComponentsContainersNodeManagerApplicationMasterYARN Resource ModelManaging Application DependenciesWrap-up
5. Installing Apache Hadoop YARN
The BasicsSystem PreparationScript-based Installation of Hadoop 2Script-based UninstallConfiguration File ProcessingConfiguration File SettingsStart-up ScriptsInstalling Hadoop with Apache AmbariWrap-up
6. Apache Hadoop YARN Administration
Script-based ConfigurationMonitoring Cluster Health: NagiosReal-time Monitoring: GangliaAdministration with AmbariJVM AnalysisBasic YARN AdministrationWrap-up
7. Apache Hadoop YARN Architecture Guide
OverviewResourceManagerNodeManagerApplicationMasterYARN ContainersSummary for Application-writersWrap-up
8. Capacity Scheduler in YARN
Introduction to the Capacity SchedulerCapacity Scheduler ConfigurationQueuesHierarchical QueuesQueue Access ControlCapacity Management with QueuesUser LimitsReservationsState of the QueuesLimits on ApplicationsUser InterfaceWrap-up
9. MapReduce with Apache Hadoop YARN
Running Hadoop YARN MapReduce ExamplesMapReduce CompatibilityThe MapReduce ApplicationMasterCalculating the Capacity of a NodeChanges to the Shuffle ServiceRunning Existing Hadoop Version 1 ApplicationsRunning MapReduce Version 1 Existing CodeAdvanced FeaturesWrap-up
10. Apache Hadoop YARN Application Example
The YARN ClientThe ApplicationMasterWrap-up
11. Using Apache Hadoop YARN Distributed-Shell
Using the YARN Distributed-ShellInternals of the Distributed-ShellWrap-up
12. Apache Hadoop YARN Frameworks
Distributed-ShellHadoop MapReduceApache TezApache GiraphHoya: HBase on YARNDryad on YARNApache SparkApache StormREEF: Retainable Evaluator Execution FrameworkHamster: Hadoop and MPI on the Same ClusterWrap-up
A. Supplemental Content and Code Downloads
Available Downloads
B. YARN Installation Scripts
install-hadoop2.shuninstall-hadoop2.shhadoop-xml-conf.sh
C. YARN Administration Scripts
configure-hadoop2.sh
D. Nagios Modules
check_resource_manager.shcheck_data_node.shcheck_resource_manager_old_space_pct.sh
E. Resources and Additional Information
F. HDFS Quick Reference
Quick Command Reference
Index

Content preview from Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

8. Capacity Scheduler in YARN

Typically organizations start Apache Hadoop deployments as single-user environments and/or just for a single team. As organizations start deriving more value from data processing and move toward mature cluster deployments, there are significant drivers to consolidate Hadoop clusters into a small number of scaled, shared clusters. This need is driven by the desire to minimize data fragmentation on multiple systems. Such concentration of data on a few HDFS clusters liberates data for organization-wide access, avoids data silos, and allows all-accommodating data-processing workflows. In addition, the operational costs and complexity of managing multiple small clusters are reduced.

Once the deployment architecture in ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Apache Hadoop YARN LiveLessons (Video Training)

Publisher Resources

ISBN: 9780133441925Purchase book Other

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

by Arun C. Murthy, Vinod Kumar Vavilapalli, Doug Eadline, Joseph Niemiec, Jeff Markham

8. Capacity Scheduler in YARN

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.