Skip to Content
Hadoop in Practice, Second Edition
book

Hadoop in Practice, Second Edition

by Alex Holmes
September 2014
Intermediate to advanced content levelIntermediate to advanced
512 pages
13h 54m
English
Manning Publications
Content preview from Hadoop in Practice, Second Edition

Part 2. Data logistics

If you’ve been thinking about how to work with Hadoop in production settings, you’ll benefit from this part of the book, which covers the first set of hurdles you’ll need to jump. These chapters detail the often-overlooked yet crucial topics that deal with data management in Hadoop.

Chapter 3 looks at ways to work with data stored in different formats, such as XML and JSON, paving the way for a broader examination of data formats such as Avro and Parquet that work best with big data and Hadoop.

Chapter 4 examines some strategies for laying out your data in HDFS, and partitioning and compacting your data. This chapter also covers ways of working with small files, as well as how compression can save you from many storage ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Hadoop in Action

Hadoop in Action

Chuck Lam
Mastering Hadoop 3

Mastering Hadoop 3

Timothy Wong, Chanchal Singh, Manish Kumar
Hadoop Application Architectures

Hadoop Application Architectures

Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira

Publisher Resources

ISBN: 9781617292224Supplemental ContentPublisher SupportOtherPublisher WebsiteErrata PageSupplemental ContentPurchase Link