O'Reilly logo

Hadoop in Practice, Second Edition by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Organizing and optimizing data in HDFS

This chapter covers

  • Tips for laying out and organizing your data
  • Data access patterns to optimize reading and writing your data
  • The importance of compression, and choosing the best codec for your needs

In the previous chapter, we looked at how to work with different file formats in MapReduce and which ones were ideally suited for storing your data. Once you’ve honed in on the data format that you’ll be using, it’s time to start thinking about how you’ll organize your data in HDFS. It’s important that you give yourself enough time early on in the design of your Hadoop system to understand how your data will be accessed so that you can optimize for the more important use cases that you’ll be ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required