O'Reilly logo

Hadoop in Practice by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Appendix C. HDFS dissected

If you’re using Hadoop you should have a solid understanding of HDFS so that you can make smart decisions about how to manage your data. In this appendix we’ll walk through how HDFS reads and writes files to help you better understand how HDSF works behind the scenes.

C.1. What is HDFS?

HDFS is a distributed filesystem modeled on the Google File System (GFS), details of which were published in a 2003 paper.[1] Google’s paper highlighted a number of key architectural and design properties, the most interesting of which included optimizations to reduce network input/output (I/O), how data replication should occur, and overall system availability and scalability. Not many details about GFS are known beyond those published ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required