In This Chapter
Scaling out statistical analysis with Hadoop
Gaining an understanding of Mahout
Working with R on Hadoop
Big data is all about applying analytics to more data, for more people. To carry out this task, big data practitioners use new tools — such as Hadoop — to explore and understand data in ways that previously might not have been possible (problems that were “too difficult,” “too expensive,” or “too slow”). Some of the “bigger analytics” that you often hear mentioned when Hadoop comes up in a conversation revolve around concepts such as machine learning, data mining, and predictive analytics. Now, what’s the common thread that runs through all these methods? That’s right: they all use good old-fashioned statistical analysis.
In this chapter, we explore some of the challenges that arise when you try to use traditional statistical tools on a Hadoop-level scale — a massive scale, in other words. We also introduce you to some common, Hadoop-specific statistical tools and show you when it makes sense to use them.
Statistical analytics is far from being a new kid on the block, and it is certainly ...