Chapter 2. Storage

After the overview of Hadoop in the previous chapter, we will now start looking at its various component parts in more detail. We will start at the conceptual bottom of the stack in this chapter: the means and mechanisms for storing data within Hadoop. In particular, we will discuss the following topics:

  • Describe the architecture of the Hadoop Distributed File System (HDFS)
  • Show what enhancements to HDFS have been made in Hadoop 2
  • Explore how to access HDFS using command-line tools and the Java API
  • Give a brief description of ZooKeeper—another (sort of) filesystem within Hadoop
  • Survey considerations for storing data in Hadoop and the available file formats

In Chapter 3, Processing – MapReduce and Beyond, we will describe how Hadoop ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.