HDFS Write
The client connects to the NameNode and asks the NameNode to let it write to the HDFS. The NameNode looks up information and plans the blocks, the Data Nodes to be used to store the blocks, and the replication strategy to be used. The NameNode does not handle any data and only tells the client where to write. Once the first DataNode receives the block, based on the replication strategy, the NameNode tells the first DataNode where else to replicate. So, the DataNode that is received from client sends the block over to the second DataNode (where the copy of the block is supposed to be written to) and then the second DataNode sends it to a third DataNode (if replication-factor is 3).
The following is the flow of a write request from ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access