O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

YARN Essentials

Book Description

A comprehensive, hands-on guide to install, administer, and configure settings in YARN

In Detail

YARN is the next generation generic resource platform used to manage resources in a typical cluster and is designed to support multitenancy in its core architecture. As optimal resource utilization is central to the design of YARN, learning how to fully utilize the available fine-grained resources (RAM, CPU cycles, and so on) in the cluster becomes vital.

This book is an easy-to-follow, self-learning guide to help you start working with YARN. Beginning with an overview of YARN and Hadoop, you will dive into the pitfalls of Hadoop 1.x and how YARN takes us to the next level. You will learn the concepts, terminology, architecture, core components, and key interactions, and cover the installation and administration of a YARN cluster as well as learning about YARN application development with new and emerging data processing frameworks.

What You Will Learn

  • Understand how existing MapReduce applications can run on top of YARN and how they are backward compatible
  • Explore the YARN concepts, terminologies, architecture, key components, and interaction between the components
  • Set up a standalone and multi-node clustered YARN environment
  • Design, develop, and run different frameworks such as MapReduce, Apache Storm, Apache Tez, and Giraffe on top of YARN
  • Get to grips with the built-in support for multitenancy in YARN
  • Discover the motivation behind YARN's architecture design, implementations, and why YARN was needed
  • Learn how failures at each level are gracefully handled by the new framework to achieve fault tolerance and scalability

Table of Contents

  1. YARN Essentials
    1. Table of Contents
    2. YARN Essentials
    3. Credits
    4. About the Authors
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Need for YARN
      1. The redesign idea
        1. Limitations of the classical MapReduce or Hadoop 1.x
        2. YARN as the modern operating system of Hadoop
        3. What are the design goals for YARN
      2. Summary
    9. 2. YARN Architecture
      1. Core components of YARN architecture
        1. ResourceManager
        2. ApplicationMaster (AM)
        3. NodeManager (NM)
      2. YARN scheduler policies
        1. The FIFO (First In First Out) scheduler
        2. The fair scheduler
        3. The capacity scheduler
      3. Recent developments in YARN architecture
      4. Summary
    10. 3. YARN Installation
      1. Single-node installation
        1. Prerequisites
          1. Platform
          2. Software
        2. Starting with the installation
          1. The standalone mode (local mode)
          2. The pseudo-distributed mode
      2. The fully-distributed mode
        1. HistoryServer
        2. Slave files
      3. Operating Hadoop and YARN clusters
        1. Starting Hadoop and YARN clusters
        2. Stopping Hadoop and YARN clusters
      4. Web interfaces of the Ecosystem
      5. Summary
    11. 4. YARN and Hadoop Ecosystems
      1. The Hadoop 2 release
      2. A short introduction to Hadoop 1.x and MRv1
      3. MRv1 versus MRv2
      4. Understanding where YARN fits into Hadoop
      5. Old and new MapReduce APIs
      6. Backward compatibility of MRv2 APIs
        1. Binary compatibility of org.apache.hadoop.mapred APIs
        2. Source compatibility of org.apache.hadoop.mapred APIs
      7. Practical examples of MRv1 and MRv2
        1. Preparing the input file(s)
        2. Running the job
        3. Result
      8. Summary
    12. 5. YARN Administration
      1. Container allocation
        1. Container allocation to the application
      2. Container configurations
      3. YARN scheduling policies
        1. The FIFO (First In First Out) scheduler
          1. The FIFO (First In First Out) scheduler
        2. The capacity scheduler
          1. Capacity scheduler configurations
        3. The fair scheduler
          1. Fair scheduler configurations
      4. YARN multitenancy application support
      5. Administration of YARN
        1. Administrative tools
        2. Adding and removing nodes from a YARN cluster
        3. Administrating YARN jobs
        4. MapReduce job configurations
        5. YARN log management
        6. YARN web user interface
      6. Summary
    13. 6. Developing and Running a Simple YARN Application
      1. Running sample examples on YARN
        1. Running a sample Pi example
      2. Monitoring YARN applications with web GUI
      3. YARN's MapReduce support
        1. The MapReduce ApplicationMaster
        2. Example YARN MapReduce settings
        3. YARN's compatibility with MapReduce applications
        4. Developing YARN applications
      4. The YARN application workflow
        1. Writing the YARN client
        2. Writing the YARN ApplicationMaster
          1. Responsibilities of the ApplicationMaster
      5. Summary
    14. 7. YARN Frameworks
      1. Apache Samza
        1. Writing a Kafka producer
        2. Writing the hello-samza project
          1. Starting a grid
      2. Storm-YARN
        1. Prerequisites
          1. Hadoop YARN should be installed
          2. Apache ZooKeeper should be installed
        2. Setting up Storm-YARN
        3. Getting the storm.yaml configuration of the launched Storm cluster
        4. Building and running Storm-Starter examples
      3. Apache Spark
        1. Why run on YARN?
      4. Apache Tez
      5. Apache Giraph
      6. HOYA (HBase on YARN)
      7. KOYA (Kafka on YARN)
      8. Summary
    15. 8. Failures in YARN
      1. ResourceManager failures
      2. ApplicationMaster failures
      3. NodeManager failures
      4. Container failures
      5. Hardware Failures
      6. Summary
    16. 9. YARN – Alternative Solutions
      1. Mesos
      2. Omega
      3. Corona
      4. Summary
    17. 10. YARN – Future and Support
      1. What YARN means to the big data industry
      2. Journey – present and future
        1. Present on-going features
        2. Future features
      3. YARN-supported frameworks
      4. Summary
    18. Index