Chapter 4

Storing Data in Hadoop: The Hadoop Distributed File System

In This Chapter

Seeing how HDFS stores files in blocks

Looking at HDFS components and architecture

Scaling out HDFS

Working with checkpoints

Federating your NameNode

Putting HDFS to the availability test

When it comes to the core Hadoop infrastructure, you have two components: storage and processing. The Hadoop Distributed File System (HDFS) is the storage component. In short, HDFS provides a distributed architecture for extremely large scale storage, which can easily be extended by scaling out.

Let us remind you why this is a big deal. In the late 1990s, after the Internet established itself as a fixture in society, Google was facing the major challenge of having to be able to store and process not only all the pages on the Internet but also Google users’ web log data. Google’s major claim to fame, then and now, ...

Get Hadoop For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop For Dummies by Dirk deRoos

Storing Data in Hadoop: The Hadoop Distributed File System

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly