Chapter 2. snow
snow (“Simple Network of
Workstations”) is probably the most popular parallel programming package
available for R. It was written by Luke Tierney, A. J. Rossini, Na Li, and
H. Sevcikova, and is actively maintained by Luke Tierney. It is a mature
package, first released on the “Comprehensive R Archive Network” (CRAN) in
2003.
Quick Look
Motivation: You want to use a Linux cluster to run an R script faster. For example, you’re running a Monte Carlo simulation on your laptop, but you’re sick of waiting many hours or days for it to finish.
Solution: Use snow to run your R code on your company or
university’s Linux cluster.
Good because: snow fits well into a traditional cluster
environment, and is able to take advantage of high-speed communication
networks, such as InfiniBand, using MPI.
How It Works
snow provides support for easily
executing R functions in parallel. Most of the parallel execution
functions in snow are variations of the
standard lapply() function, making
snow fairly easy to learn. To implement
these parallel operations, snow uses a
master/worker architecture, where the master sends tasks to the workers,
and the workers execute the tasks and return the results to the
master.
One important feature of snow is
that it can be used with different transport mechanisms to communicate
between the master and workers. This allows it to be portable, but still
take advantage of high-performance communication mechanisms if available.
snow can be used with socket connections, ...