Skip to Content
Parallel R
book

Parallel R

by Q. Ethan McCallum, Stephen Weston
October 2011
Intermediate to advanced
126 pages
3h 10m
English
O'Reilly Media, Inc.
Content preview from Parallel R

Chapter 7. RHIPE

This chapter is a guide to Saptarshi Guha’s RHIPE package, the R and Hadoop Integrated Processing Environment. RHIPE’s development history dates back to 2009 and it is still actively maintained by the original author.

Compared to R+Hadoop, RHIPE abstracts you from raw Hadoop but still requires an understanding of the MapReduce model.

Since you covered a lot of MapReduce and Hadoop details in the previous two chapters, this chapter will have a very short route to the examples.

Quick Look

Motivation: You like the power of MapReduce, as explained in the previous chapter, but you want something a little more R-centric.

Solution: Use the RHIPE R package as your Hadoop emissary. Even though you’ll still have to understand MapReduce, you won’t have to directly touch Hadoop.

Good because: You get Hadoop’s power without leaving the comfy confines of R’s language and interactive shell. (RHIPE even includes tools to work with HDFS.) This means you can MapReduce through a mountain of data during an interactive session of exploratory analysis.

How It Works

RHIPE sits between you and Hadoop. You write your Map and Reduce functions as R code, and RHIPE handles the scut work of invoking Hadoop commands.

To give you a quick example, here’s a typical RHIPE call:

rhipe.job.def <- rhmr(
        map= ... block of R code for Mapper
        reduce= ... block of R code for Reducer
        ifolder="/path/to/input" ,
        ofolder="/path/to/output" ,
        ... a couple other RHIPE options
)

rhex( rhipe.job.ref )

That’s it! There’s no ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced R

Advanced R

Hadley Wickham
Learning R

Learning R

Richard Cotton
Mastering Spark with R

Mastering Spark with R

Javier Luraschi, Kevin Kuo, Edgar Ruiz

Publisher Resources

ISBN: 9781449317850Supplemental ContentErrata