Parallel Computation with R

One of the best techniques for speeding up large computing problems is to break them into lots of little pieces, solve the pieces separately on different processors and then put the pieces back together. This is called parallel computing, because it enables you to solve problems in parallel. For example, suppose that you had a lot of laundry: enough to fill 10 washing machines. Suppose each wash took 45 minutes, and each drying took 45 minutes. If you had only one washing machine and dryer, it would take 495 minutes to finish all the laundry. However, if you had 10 washing machines and ten dryers, you could finish the laundry in 90 minutes.

In Chapters 20 and 21, we will show some cutting-edge techniques for statistical modeling. Many of these problems are very computationally intensive and could take a long time to finish. Luckily, many of them are very parallelizable. For example, we will show several algorithms that build models by fitting a large number of tree models to the underlying data (such as boosting, bagging, and random forests). Each of these algorithms could be run in parallel if more processors were available.

In Looping Extensions, we showed some extensions to R’s built-in looping functions. Revolution Computing developed these extensions to help facilitate parallel computation. Revolution Computing has also released a package called doMC that facilitates running R code on multiple cores.

To write code that takes advantage of multiple cores, ...

Get R in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.