Chapter 7. RHIPE

This chapter is a guide to Saptarshi Guha’s RHIPE package, the R and Hadoop Integrated Processing Environment. RHIPE’s development history dates back to 2009 and it is still actively maintained by the original author.

Compared to R+Hadoop, RHIPE abstracts you from raw Hadoop but still requires an understanding of the MapReduce model.

Since you covered a lot of MapReduce and Hadoop details in the previous two chapters, this chapter will have a very short route to the examples.

Quick Look

Motivation: You like the power of MapReduce, as explained in the previous chapter, but you want something a little more R-centric.

Solution: Use the RHIPE R package as your Hadoop emissary. Even though you’ll still have to understand MapReduce, you won’t have to directly touch Hadoop.

Good because: You get Hadoop’s power without leaving the comfy confines of R’s language and interactive shell. (RHIPE even includes tools to work with HDFS.) This means you can MapReduce through a mountain of data during an interactive session of exploratory analysis.

How It Works

RHIPE sits between you and Hadoop. You write your Map and Reduce functions as R code, and RHIPE handles the scut work of invoking Hadoop commands.

To give you a quick example, here’s a typical RHIPE call:

rhipe.job.def <- rhmr(
        map= ... block of R code for Mapper
        reduce= ... block of R code for Reducer
        ifolder="/path/to/input" ,
        ofolder="/path/to/output" ,
        ... a couple other RHIPE options
)

rhex( rhipe.job.ref )

That’s it! There’s no ...

Get Parallel R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Parallel R by Q. Ethan McCallum, Stephen Weston

Chapter 7. RHIPE

Quick Look

How It Works

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly