Chapter 2. snow
snow
(“Simple Network of
Workstations”) is probably the most popular parallel programming package
available for R. It was written by Luke Tierney, A. J. Rossini, Na Li, and
H. Sevcikova, and is actively maintained by Luke Tierney. It is a mature
package, first released on the “Comprehensive R Archive Network” (CRAN) in
2003.
Quick Look
Motivation: You want to use a Linux cluster to run an R script faster. For example, you’re running a Monte Carlo simulation on your laptop, but you’re sick of waiting many hours or days for it to finish.
Solution: Use snow
to run your R code on your company or
university’s Linux cluster.
Good because: snow
fits well into a traditional cluster
environment, and is able to take advantage of high-speed communication
networks, such as InfiniBand, using MPI.
How It Works
snow
provides support for easily
executing R functions in parallel. Most of the parallel execution
functions in snow
are variations of the
standard lapply()
function, making
snow
fairly easy to learn. To implement
these parallel operations, snow
uses a
master/worker architecture, where the master sends tasks to the workers,
and the workers execute the tasks and return the results to the
master.
One important feature of snow
is
that it can be used with different transport mechanisms to communicate
between the master and workers. This allows it to be portable, but still
take advantage of high-performance communication mechanisms if available.
snow
can be used with socket connections, ...
Get Parallel R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.