O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hadoop Cluster Deployment

Book Description

Construct a modern Hadoop data platform effortlessly and gain insights into how to manage clusters efficiently

  • Choose the hardware and Hadoop distribution that best suits your needs
  • Get more value out of your Hadoop cluster with Hive, Impala, and Sqoop
  • Learn useful tips for performance optimization and security

In Detail

Big Data is the hottest trend in the IT industry at the moment. Companies are realizing the value of collecting, retaining, and analyzing as much data as possible. They are therefore rushing to implement the next generation of data platform, and Hadoop is the centerpiece of these platforms.

This practical guide is filled with examples which will show you how to successfully build a data platform using Hadoop. Step-by-step instructions will explain how to install, configure, and tie all major Hadoop components together. This book will allow you to avoid common pitfalls, follow best practices, and go beyond the basics when building a Hadoop cluster.

This book will walk you through the process of building a Hadoop cluster from the ground up. By using practical examples and command samples, you will be able to get a cluster up and running in no time, and you will also gain a deep understanding of how various Hadoop components work and interact with each other.

You will learn how to pick the right hardware for different types of Hadoop clusters and about the differences between various Hadoop distributions. By the end of this book, you will be able to install and configure several of the most popular Hadoop ecosystem projects including Hive, Impala, and Sqoop, and you will also be given a sneak peek into the pros and cons of using Hadoop in the cloud.

Table of Contents

  1. Hadoop Cluster Deployment
    1. Table of Contents
    2. Hadoop Cluster Deployment
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Errata
        2. Piracy
        3. Questions
    8. 1. Setting Up Hadoop Cluster – from Hardware to Distribution
      1. Choosing Hadoop cluster hardware
        1. Choosing the DataNode hardware
        2. Low storage density cluster
        3. High storage density cluster
        4. NameNode and JobTracker hardware configuration
          1. The NameNode hardware
          2. The JobTracker hardware
        5. Gateway and other auxiliary services
        6. Network considerations
        7. Hadoop hardware summary
      2. Hadoop distributions
        1. Hadoop versions
        2. Choosing Hadoop distribution
        3. Cloudera Hadoop distribution
        4. Hortonworks Hadoop distribution
        5. MapR
      3. Choosing OS for the Hadoop cluster
      4. Summary
    9. 2. Installing and Configuring Hadoop
      1. Configuring OS for Hadoop cluster
        1. Choosing and setting up the filesystem
        2. Setting up Java Development Kit
        3. Other OS settings
        4. Setting up the CDH repositories
      2. Setting up NameNode
        1. JournalNode, ZooKeeper, and Failover Controller
        2. Hadoop configuration files
        3. NameNode HA configuration
        4. JobTracker configuration
          1. Configuring the job scheduler
            1. JobQueueTaskScheduler
            2. FairScheduler
            3. CapacityTaskScheduler
        5. DataNode configuration
          1. TaskTracker configuration
          2. Advanced Hadoop tuning
            1. hdfs-site.xml
            2. mapred-site.xml
            3. core-site.xml
      3. Summary
    10. 3. Configuring the Hadoop Ecosystem
      1. Hosting the Hadoop ecosystem
      2. Sqoop
        1. Installing and configuring Sqoop
        2. Sqoop import example
        3. Sqoop export example
      3. Hive
        1. Hive architecture
        2. Installing Hive Metastore
        3. Installing the Hive client
        4. Installing Hive Server
      4. Impala
        1. Impala architecture
        2. Installing Impala state store
        3. Installing the Impala server
      5. Summary
    11. 4. Securing Hadoop Installation
      1. Hadoop security overview
      2. HDFS security
      3. MapReduce security
      4. Hadoop Service Level Authorization
      5. Hadoop and Kerberos
        1. Kerberos overview
        2. Kerberos in Hadoop
          1. Configuring Kerberos clients
          2. Generating Kerberos principals
          3. Enabling Kerberos for HDFS
          4. Enabling Kerberos for MapReduce
      6. Summary
    12. 5. Monitoring Hadoop Cluster
      1. Monitoring strategy overview
      2. Hadoop Metrics
        1. JMX Metrics
        2. Monitoring Hadoop with Nagios
        3. Monitoring HDFS
        4. NameNode checks
        5. JournalNode checks
        6. ZooKeeper checks
      3. Monitoring MapReduce
        1. JobTracker checks
      4. Monitoring Hadoop with Ganglia
      5. Summary
    13. 6. Deploying Hadoop to the Cloud
      1. Amazon Elastic MapReduce
        1. Installing the EMR command-line interface
        2. Choosing the Hadoop version
        3. Launching the EMR cluster
          1. Temporary EMR clusters
          2. Preparing input and output locations
      2. Using Whirr
        1. Installing and configuring Whirr
      3. Summary
    14. Index