September 2014
Intermediate to advanced
512 pages
13h 54m
English
If you’ve been thinking about how to work with Hadoop in production settings, you’ll benefit from this part of the book, which covers the first set of hurdles you’ll need to jump. These chapters detail the often-overlooked yet crucial topics that deal with data management in Hadoop.
Chapter 3 looks at ways to work with data stored in different formats, such as XML and JSON, paving the way for a broader examination of data formats such as Avro and Parquet that work best with big data and Hadoop.
Chapter 4 examines some strategies for laying out your data in HDFS, and partitioning and compacting your data. This chapter also covers ways of working with small files, as well as how compression can save you from many storage ...