Skip to Content
Parallel R
book

Parallel R

by Q. Ethan McCallum, Stephen Weston
October 2011
Intermediate to advanced
126 pages
3h 10m
English
O'Reilly Media, Inc.
Content preview from Parallel R

Chapter 8. Segue

Welcome to the last of the book’s recipes for R parallelism. This will be a short chapter, but don’t let that fool you: Segue’s scope is intentionally narrow. This focus makes it a particularly powerful tool.

Segue’s mission is as simple as it gets: make it easy to use Elastic MapReduce as a parallel backend for lapply()-style operations. So easy, in fact, that it boasts of doing this in only two lines of R code.[59]

This narrow focus is no accident. Segue’s creator, JD Long, wanted occasional access to a Hadoop cluster to run his pleasantly parallel,[60] computationally expensive models. Elastic MapReduce was a great fit but still a bit cumbersome for his workflow. He created Segue to tackle the grunt work so he could focus on his higher-level modeling tasks.

Segue is a relatively young package. Nonetheless, since its creation in 2010, it has attracted a fair amount of attention.

Quick Look

Motivation: You want Hadoop power to drive some lapply() loops, perhaps for a parameter sweep, but you want minimal Hadoop contact. You consider MapReduce to be too much of a distraction from your work.

Solution: Use the segue package’s emrlapply() to send your calculations up to Elastic MapReduce, the Amazon Web Services cloud-based Hadoop product.

Good because: You get to focus on your modelling work, while segue takes care of transforming your lapply() work into a Hadoop job.

How It Works

Segue takes care of launching the Elastic MapReduce cluster, shipping data back and forth, and ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced R

Advanced R

Hadley Wickham
Learning R

Learning R

Richard Cotton
Mastering Spark with R

Mastering Spark with R

Javier Luraschi, Kevin Kuo, Edgar Ruiz

Publisher Resources

ISBN: 9781449317850Supplemental ContentErrata