HDFS
HDFS is the Hadoop Distributed File System. Modeled on GFS, Google File System, it allows the storage and processing of large amounts of data. HDFS is modeled as a file system, with support for files larger than what could fit on a single machine. Files are stored as collections of blocks, spread out across a collection of machines.
The data blocks are stored at DataNodes, which handles all of the I/O for the blocks they are responsible for. The block assignment metadata is stored at the NameNode. The NameNode is responsible for admin and meta operations, such as creating directories, listing directories, file moves, and so on. The HDFS client implements the file system interface and provides client applications the logical abstraction ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access