Chapter 4

Storing Data in Hadoop: The Hadoop Distributed File System

In This Chapter

arrow Seeing how HDFS stores files in blocks

arrow Looking at HDFS components and architecture

arrow Scaling out HDFS

arrow Working with checkpoints

arrow Federating your NameNode

arrow Putting HDFS to the availability test

When it comes to the core Hadoop infrastructure, you have two components: storage and processing. The Hadoop Distributed File System (HDFS) is the storage component. In short, HDFS provides a distributed architecture for extremely large scale storage, which can easily be extended by scaling out.

Let us remind you why this is a big deal. In the late 1990s, after the Internet established itself as a fixture in society, Google was facing the major challenge of having to be able to store and process not only all the pages on the Internet but also Google users’ web log data. Google’s major claim to fame, then and now, ...

Get Hadoop For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.