O'Reilly logo

Hadoop For Dummies by Dirk deRoos

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4

Storing Data in Hadoop: The Hadoop Distributed File System

In This Chapter

arrow Seeing how HDFS stores files in blocks

arrow Looking at HDFS components and architecture

arrow Scaling out HDFS

arrow Working with checkpoints

arrow Federating your NameNode

arrow Putting HDFS to the availability test

When it comes to the core Hadoop infrastructure, you have two components: storage and processing. The Hadoop Distributed File System (HDFS) is the storage component. In short, HDFS provides a distributed architecture for extremely large scale storage, which can easily be extended by scaling out.

Let us remind you why this is a big deal. In the late 1990s, after the Internet established itself as a fixture in society, Google was facing the major challenge of having to be able to store and process not only all the pages on the Internet but also Google users’ web log data. Google’s major claim to fame, then and now, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required