book

Hadoop 2.x Administration Cookbook

Name: Hadoop 2.x Administration Cookbook
Author: Aman Singh
ISBN: 9781787126732

by Aman Singh

May 2017

Intermediate to advanced

348 pages

7h 8m

English

Packt Publishing

Read now

Unlock full access

Hadoop 2.x Administration Cookbook
Table of Contents
Hadoop 2.x Administration Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and moreWhy subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book

Who this book is for
Sections
Getting readyHow to do it…How it works…There's more…See also
Conventions
Reader feedback
Customer support
Downloading the example codeDownloading the color images of this bookErrataPiracyQuestions
1. Hadoop Architecture and Deployment
IntroductionOverview of Hadoop Architecture
Building and compiling Hadoop
Getting readyHow to do it...How it works...
Installation methods
Getting readyHow to do it...How it works...
Setting up host resolution
Getting readyHow to do it...How it works...
Installing a single-node cluster - HDFS components
Getting readyHow to do it...How it works...There's more...Setting up ResourceManager and NodeManager
Installing a single-node cluster - YARN components
Getting readyHow to do it...How it works...There's more...See also
Installing a multi-node cluster
Getting readyHow to do it...How it works...
Configuring the Hadoop Gateway node
Getting readyHow to do it...How it works...See also
Decommissioning nodes
Getting readyHow to do it...How it works...See also
Adding nodes to the cluster
Getting readyHow to do it...How it works...There's more...
2. Maintaining Hadoop Cluster HDFS
IntroductionOverview of HDFS
Configuring HDFS block size
Getting readyHow to do it...How it works...
Setting up Namenode metadata location
Getting readyHow to do it...How it works...
Loading data in HDFS
Getting readyHow to do it...How it works...
Configuring HDFS replication
Getting readyHow to do it...How it works...See also
HDFS balancer
Getting readyHow to do it...How it works...
Quota configuration
Getting readyHow to do it...How it works...
HDFS health and FSCK
Getting readyHow to do it...How it works...See also
Configuring rack awareness
Getting readyHow to do it...How it works...See also
Recycle or trash bin configuration
Getting readyHow to do it...How it works...There's more...
Distcp usage
Getting readyHow to do it...How it works...
Control block report storm
Getting readyHow to do it...How it works...
Configuring Datanode heartbeat
Getting readyHow to do it...How it works...
3. Maintaining Hadoop Cluster – YARN and MapReduce
Introduction
Running a simple MapReduce program
Getting readyHow to do it...
Hadoop streaming
Getting readyHow to do it...How it works...
Configuring YARN history server
Getting readyHow to do it...How it works...There's more...
Job history web interface and metrics
Getting readyHow to do it...How it works...
Configuring ResourceManager components
Getting readyHow to do it...How it works...There's more...See also
YARN containers and resource allocations
Getting readyHow to do it...How it works...There's more...See also
ResourceManager Web UI and JMX metrics
Getting readyHow to do it...How it works...
Preserving ResourceManager states
Getting readyHow to do it...How it works...There's more...
4. High Availability
Introduction
Namenode HA using shared storage
Getting readyHow to do it...How it works...See also
ZooKeeper configuration
Getting readyHow to do it...How it works...
Namenode HA using Journal node
Getting readyHow to do it...How it works...
Resourcemanager HA using ZooKeeper
Getting readyHow to do it...How it works…
Rolling upgrade with HA
Getting readyHow to do it...How it works...
Configure shared cache manager
Getting readyHow to do it...There's more...See also
Configure HDFS cache
Getting readyHow to do it...How it works...See also
HDFS snapshots
Getting readyHow to do it...How it works...
Configuring storage based policies
Getting readyHow to do it...How it works...
Configuring HA for Edge nodes
Getting readyHow to do it...How it works...
5. Schedulers
Introduction
Configuring users and groups
Getting readyHow to do it...How it works...See also
Fair Scheduler configuration
Getting readyHow to do it...How it works...
Fair Scheduler pools
Getting readyHow to do it...How it works...
Configuring job queues
Getting readyHow to do it...How it works...See also
Job queue ACLs
Getting readyHow to do it...How it works...See also
Configuring Capacity Scheduler
Getting readyHow to do it...How it works...See also
Queuing mappings in Capacity Scheduler
Getting readyHow to do it...How it works...
YARN and Mapred commands
Getting readyHow to do it...How it works...
YARN label-based scheduling
Getting readyHow to do it...How it works...
YARN SLS
Getting readyHow to do it...How it works...
6. Backup and Recovery
Introduction
Initiating Namenode saveNamespace
Getting readyHow to do it...How it works...
Using HDFS Image Viewer
Getting readyHow to do it...How it works...
Fetching parameters which are in-effect
Getting readyHow to do it...How it works...
Configuring HDFS and YARN logs
Getting readyHow to do it...How it works...See also
Backing up and recovering Namenode
Getting readyHow to do it...How it works...See also
Configuring Secondary Namenode
Getting readyHow to do it...How it works…
Promoting Secondary Namenode to Primary
Getting readyHow to do it...How it works...See also
Namenode recovery
Getting readyHow to do it...How it works...
Namenode roll edits – online mode
Getting readyHow to do it...How it works...
Namenode roll edits – offline mode
Getting readyHow to do it...How it works...
Datanode recovery – disk full
Getting readyHow to do it...How it works...
Configuring NFS gateway to serve HDFS
Getting readyHow to do it...How it works...
Recovering deleted files
Getting readyHow to do it...How it works...
7. Data Ingestion and Workflow
Introduction
Hive server modes and setup
Getting readyHow to do it...How it works...
Using MySQL for Hive metastore
How to do it…How it works...
Operating Hive with ZooKeeper
Getting readyHow to do it...How it works...
Loading data into Hive
Getting readyHow to do it...How it works...See also
Partitioning and Bucketing in Hive
Getting readyHow to do it...How it works...See also
Hive metastore database
Getting readyHow to do it...How it works...See also
Designing Hive with credential store
Getting readyHow to do it...How it works...
Configuring Flume
Getting readyHow to do it...How it works...
Configure Oozie and workflows
Getting readyHow to do it...How it works...
8. Performance Tuning
Tuning the operating systemGetting readyHow to do it...How it works...See also
Tuning the disk
Getting readyHow to do it...How it works...
Tuning the network
Getting readyHow to do it...How it works...
Tuning HDFS
Getting readyHow to do it...How it works...
Tuning Namenode
Getting readyHow to do it...There's more...See also
Tuning Datanode
Getting readyHow to do it...How it works...See also
Configuring YARN for performance
Getting readyHow to do it...How it works...
Configuring MapReduce for performance
Getting readyHow to do it...How it works...
Hive performance tuning
Getting readyHow to do it...There's more...How it works...
Benchmarking Hadoop cluster
Getting readyHow to do it...Benchmark 1--Testing HDFS with TestDFSIOBenchmark 2--Stress testing NamenodeBenchmark 3--MapReduce testing by generating small filesBenchmark 4--TeraGen, TeraSort, and TeraValidate benchmarksThere's more...How it works...
9. HBase Administration
Introduction
Setting up single node HBase cluster
Getting readyHow to do it...How it works...
Setting up multi-node HBase cluster
Getting readyHow to do it...How it works...
Inserting data into HBase
Getting readyHow to do it...How it works...
Integration with Hive
Getting readyHow to do it...How it works...See also
HBase administration commands
Getting readyHow to do it...How it works...See also
HBase backup and restore
Getting readyHow to do it...How it works...
Tuning HBase
Getting readyHow to do it...How it works...
HBase upgrade
Getting readyHow to do it...How it works...
Migrating data from MySQL to HBase using Sqoop
Getting readyHow to do it...
10. Cluster Planning
Introduction
Disk space calculations
Getting readyHow to do it...How it works...
Nodes needed in the cluster
Getting readyHow to do it...How it works...See also
Memory requirements
Getting readyHow to do it...How it works...See also
Sizing the cluster as per SLA
Getting readyHow to do it...How it works...See also
Network design
Getting readyHow to do it...How it works...
Estimating the cost of the Hadoop cluster
How to do it...How it works...
Hardware and software options
How it works...
11. Troubleshooting, Diagnostics, and Best Practices
Introduction
Namenode troubleshooting
Getting readyHow to do it...How it works...See also
Datanode troubleshooting
Getting readyHow to do it...How it works...See also
Resourcemanager troubleshooting
Getting readyHow to do it…How it works...See also
Diagnose communication issues
Getting readyHow to do it...How it works...
Parse logs for errors
Getting readyHow to do it...How it works...
Hive troubleshooting
Getting readyHow to do it...How it works...See also
HBase troubleshooting
Getting readyHow to do it...How it works...
Hadoop best practices
How it works...
12. Security
Introduction
Encrypting disk using LUKS
Getting readyHow to do it...How it works...See also
Configuring Hadoop users
Getting readyHow to do it...How it works...
HDFS encryption at Rest
Getting readyHow to do it...How it works...
Configuring SSL in Hadoop
Getting readyHow to do it...How it works...See also
In-transit encryption
Getting readyHow to do it...There's more...See also
Enabling service level authorization
Getting readyHow to do it...How it works...See also
Securing ZooKeeper
Getting readyHow to do it...How it works...
Configuring auditing
Getting readyHow to do it...How it works...
Configuring Kerberos server
Getting readyHow to do it...How it works...
Configuring and enabling Kerberos for Hadoop
Getting readyHow to do it...How it works...
Index

Content preview from Hadoop 2.x Administration Cookbook

Installing a multi-node cluster

In the previous recipes, we looked at how to configure a single-node Hadoop cluster, also referred to as pseudo-distributed cluster. In this recipe, we will set up a fully distributed cluster with each daemon running on separate nodes.

There will be one node for Namenode, one for ResourceManager, and four nodes will be used for Datanode and NodeManager. In production, the number of Datanodes could be in the thousands, but here we are just restricted to four nodes. The Datanode and NodeManager coexist on the same nodes for the purposes of data locality and locality of reference.

Getting ready

Make sure that the six nodes the user chooses have JDK installed, with name resolution working. This could be done by making ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Hadoop Real-World Solutions Cookbook - Second Edition

Publisher Resources

ISBN: 9781787126732

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Hadoop 2.x Administration Cookbook

by Aman Singh

Installing a multi-node cluster

Getting ready

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

Hadoop Real-World Solutions Cookbook - Second Edition

HBase High Performance Cookbook

Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem

Hadoop: Data Processing and Modelling

Publisher Resources