Chapter 9

Statistical Analysis in Hadoop

In This Chapter

arrow Scaling out statistical analysis with Hadoop

arrow Gaining an understanding of Mahout

arrow Working with R on Hadoop

Big data is all about applying analytics to more data, for more people. To carry out this task, big data practitioners use new tools — such as Hadoop — to explore and understand data in ways that previously might not have been possible (problems that were “too difficult,” “too expensive,” or “too slow”). Some of the “bigger analytics” that you often hear mentioned when Hadoop comes up in a conversation revolve around concepts such as machine learning, data mining, and predictive analytics. Now, what’s the common thread that runs through all these methods? That’s right: they all use good old-fashioned statistical analysis.

In this chapter, we explore some of the challenges that arise when you try to use traditional statistical tools on a Hadoop-level scale — a massive scale, in other words. We also introduce you to some common, Hadoop-specific statistical tools and show you when it makes sense to use them.

Pumping Up Your Statistical Analysis

Statistical analytics is far from being a new kid on the block, and it is certainly ...

Get Hadoop For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.