Chapter 2. Maintaining Hadoop Cluster HDFS

In this chapter, we will cover the following recipes:

  • Configuring HDFS block size
  • Setting up Namenode metadata location
  • Loading data into HDFS
  • Configuring HDFS replication
  • HDFS balancer
  • Quota configuration
  • HDFS health and FSCK
  • Configuring rack awareness
  • Recycle or trash bin configuration
  • Distcp usage
  • Controlling block report storm
  • Configuring Datanode heartbeat

Introduction

In this chapter, we will take a look at the storage layer, which is HDFS, and how it can be configured for storing data. It is important to ensure the good health of this distributed filesystem, and make sure that the data it contains is available, even in the case of failures. In this chapter, we will take a look at the replication, quota setup, ...

Get Hadoop 2.x Administration Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.