Chapter 6. Diagnosing and tuning performance problems

In this chapter

Measuring and visualizing MapReduce execution times
Optimizing the shuffle and sort phases
Improving performance with user space MapReduce best practices

Imagine you wrote a new piece of MapReduce code and you’re executing it on your shiny new cluster. You’re surprised to learn that despite having a good-size cluster, your job is running significantly longer than you expected. You’ve obviously hit a performance issue with your job, but how do you figure out where the problem lies?

One of Hadoop’s selling points when it comes to performance is that it scales horizontally. This means that adding nodes tends to yield a linear increase in throughput, and often in job execution ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop in Practice by Alex Holmes

Chapter 6. Diagnosing and tuning performance problems

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly