Chapter 7. RHIPE
This chapter is a guide to Saptarshi Guha’s RHIPE
package, the R and Hadoop Integrated
Processing Environment. RHIPE
’s
development history dates back to 2009 and it is still actively maintained
by the original author.
Compared to R+Hadoop, RHIPE
abstracts you from raw Hadoop but still requires an understanding of the
MapReduce model.
Since you covered a lot of MapReduce and Hadoop details in the previous two chapters, this chapter will have a very short route to the examples.
Quick Look
Motivation: You like the power of MapReduce, as explained in the previous chapter, but you want something a little more R-centric.
Solution: Use the RHIPE
R package as your Hadoop emissary. Even
though you’ll still have to understand MapReduce, you won’t have to
directly touch Hadoop.
Good because: You get Hadoop’s
power without leaving the comfy confines of R’s language and interactive
shell. (RHIPE
even includes tools to
work with HDFS.) This means you can MapReduce through a mountain of data
during an interactive session of exploratory analysis.
How It Works
RHIPE
sits between you and
Hadoop. You write your Map and Reduce functions as R code, and RHIPE
handles the scut work of invoking Hadoop
commands.
To give you a quick example, here’s a typical RHIPE
call:
rhipe.job.def <- rhmr( map= ... block of R code for Mapper reduce= ... block of R code for Reducer ifolder="/path/to/input" , ofolder="/path/to/output" , ... a couple other RHIPE options ) rhex( rhipe.job.ref )
That’s it! There’s no ...
Get Parallel R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.