Storing Data in Hadoop: The Hadoop Distributed File System
In This Chapter
Seeing how HDFS stores files in blocks
Looking at HDFS components and architecture
Scaling out HDFS
Working with checkpoints
Federating your NameNode
Putting HDFS to the availability test
When it comes to the core Hadoop infrastructure, you have two components: storage and processing. The Hadoop Distributed File System (HDFS) is the storage component. In short, HDFS provides a distributed architecture for extremely large scale storage, which can easily be extended by scaling out.
Let us remind you why this is a big deal. In the late 1990s, after the Internet established itself as a fixture in society, Google was facing the major challenge of having to be able to store and process not only all the pages on the Internet but also Google users’ web log data. Google’s major claim to fame, then and now, ...