Chapter 8. Integrating R and Hadoop for statistics and more


This chapter covers
  • Integrating your R scripts with MapReduce and Streaming
  • Understanding Rhipe, RHadoop, and R + Streaming


R is a statistical programming language for performing data analysis and graphing the results. The capabilities of R[1] let you perform statistical and predictive analytics, data mining, and visualization functions on your data. Its breadth of coverage and applicability across a wide range of sectors (such as finance, life sciences, manufacturing, retail, and more) make it a popular tool.

1 R contains built-in as well as user-created packages which can be accessed via CRAN, its package distribution system; see (

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.