Hadoop Distributed File System

Hadoop Distributed File System (HDFS) is a filesystem spread across multiple servers and is designed to run on low-cost commodity hardware. HDFS supports a write-once and read many philosophy. It was designed for large-scale batch processing work on large to enormous sized files.

Files are divided up into blocks. A typical block size is 128 MB. A file on HDFS is sliced up into 128 MB chunks (the blocks) and distributed across different data nodes. Files in HDFS normally range from gigabytes to terabytes.

HDFS was designed for batch processing more than low-latency interactive queries from users. HDFS is not meant for files that frequently change with data updates. New data is typically appended to files or added ...

Get Analytics for the Internet of Things (IoT) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.