O'Reilly logo

Hadoop For Dummies by Dirk deRoos

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9

Statistical Analysis in Hadoop

In This Chapter

arrow Scaling out statistical analysis with Hadoop

arrow Gaining an understanding of Mahout

arrow Working with R on Hadoop

Big data is all about applying analytics to more data, for more people. To carry out this task, big data practitioners use new tools — such as Hadoop — to explore and understand data in ways that previously might not have been possible (problems that were “too difficult,” “too expensive,” or “too slow”). Some of the “bigger analytics” that you often hear mentioned when Hadoop comes up in a conversation revolve around concepts such as machine learning, data mining, and predictive analytics. Now, what’s the common thread that runs through all these methods? That’s right: they all use good old-fashioned statistical analysis.

In this chapter, we explore some of the challenges that arise when you try to use traditional statistical tools on a Hadoop-level scale — a massive scale, in other words. We also introduce you to some common, Hadoop-specific statistical tools and show you when it makes sense to use them.

Pumping Up Your Statistical Analysis

Statistical analytics is far from being a new kid on the block, and it is certainly ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required