Chapter 4. Organizing and optimizing data in HDFS

This chapter covers

  • Tips for laying out and organizing your data
  • Data access patterns to optimize reading and writing your data
  • The importance of compression, and choosing the best codec for your needs

In the previous chapter, we looked at how to work with different file formats in MapReduce and which ones were ideally suited for storing your data. Once you’ve honed in on the data format that you’ll be using, it’s time to start thinking about how you’ll organize your data in HDFS. It’s important that you give yourself enough time early on in the design of your Hadoop system to understand how your data will be accessed so that you can optimize for the more important use cases that you’ll be ...

Get Hadoop in Practice, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.