Chapter 7. RHIPE
This chapter is a guide to Saptarshi Guha’s RHIPE package, the R and Hadoop Integrated
Processing Environment. RHIPE’s
development history dates back to 2009 and it is still actively maintained
by the original author.
Compared to R+Hadoop, RHIPE
abstracts you from raw Hadoop but still requires an understanding of the
MapReduce model.
Since you covered a lot of MapReduce and Hadoop details in the previous two chapters, this chapter will have a very short route to the examples.
Quick Look
Motivation: You like the power of MapReduce, as explained in the previous chapter, but you want something a little more R-centric.
Solution: Use the RHIPE R package as your Hadoop emissary. Even
though you’ll still have to understand MapReduce, you won’t have to
directly touch Hadoop.
Good because: You get Hadoop’s
power without leaving the comfy confines of R’s language and interactive
shell. (RHIPE even includes tools to
work with HDFS.) This means you can MapReduce through a mountain of data
during an interactive session of exploratory analysis.
How It Works
RHIPE sits between you and
Hadoop. You write your Map and Reduce functions as R code, and RHIPE handles the scut work of invoking Hadoop
commands.
To give you a quick example, here’s a typical RHIPE call:
rhipe.job.def <- rhmr(
map= ... block of R code for Mapper
reduce= ... block of R code for Reducer
ifolder="/path/to/input" ,
ofolder="/path/to/output" ,
... a couple other RHIPE options
)
rhex( rhipe.job.ref )That’s it! There’s no ...