O'Reilly logo

Hadoop in Practice by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Streamlining HDFS for big data

 

This chapter covers
  • Understanding how to work with small files
  • Working with compression
  • Choosing the best codec for splittability and performance

 

In the previous chapter we looked at how to work effectively with MapReduce and big data. You might tend to spend more time thinking about MapReduce because of its computational layer, but you shouldn’t deprive HDFS of attention when wrestling with big data, because improvements to HDFS will pay great dividends in terms of performance with our MapReduce jobs.

In view of this, this chapter is dedicated to looking at ways to efficiently store and access big data in HDFS. The first subject I’ll address is how to work with a large number of small files ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required