book

Parallel R

by Q. Ethan McCallum, Stephen Weston

October 2011

Intermediate to advanced

126 pages

3h 10m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Conventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgmentsQ. Ethan McCallumStephen Weston
Why R?Why Not R?The Solution: Parallel ExecutionA Road Map for This BookWhat We’ll CoverLooking Forward…What We’ll Assume You Already KnowIn a Hurry?snowmulticoreparallelR+HadoopRHIPESegueSummary
Quick LookHow It WorksSetting UpWorking with ItCreating Clusters with makeClusterParallel K-MeansInitializing WorkersLoad Balancing with clusterApplyLBTask Chunking with parLapplyVectorizing with clusterSplitLoad Balancing ReduxFunctions and EnvironmentsRandom Number Generationsnow ConfigurationInstalling RmpiExecuting snow Programs on a Cluster with RmpiExecuting snow Programs with a Batch Queueing SystemTroubleshooting snow ProgramsWhen It Works……And When It Doesn’tThe Wrap-up
Quick LookHow It WorksSetting UpWorking with ItThe mclapply FunctionThe mc.cores OptionThe mc.set.seed OptionLoad Balancing with mclapplyThe pvec FunctionThe parallel and collect FunctionsUsing collect OptionsParallel Random Number GenerationThe Low-Level APIWhen It Works……And When It Doesn’tThe Wrap-up
Quick LookHow It WorksSetting UpWorking with ItGetting StartedCreating Clusters with makeClusterParallel Random Number GenerationSummary of DifferencesWhen It Works……And When It Doesn’tThe Wrap-up
Hadoop at Cruising AltitudeA MapReduce PrimerThinking in MapReduce: Some Pseudocode ExamplesCalculate Average Call Length for Each DateNumber of Calls by Each User, on Each DateRun a Special Algorithm on Each RecordBinary and Whole-File Data: SequenceFilesNo Cluster? No Problem! Look to the Clouds…The Wrap-up
Quick LookHow It WorksSetting UpWorking with ItSimple Hadoop Streaming (All Text)Streaming, Redux: Indirectly Working with Binary DataThe Java API: Binary Input and OutputProcessing Related Groups (the Full Map and Reduce Phases)When It Works……And When It Doesn’tThe Wrap-up

Quick LookHow It WorksSetting UpWorking with ItPhone Call Records, ReduxTweet BrevityMore Complex Tweet AnalysisWhen It Works……And When It Doesn’tThe Wrap-up
Quick LookHow It WorksSetting UpWorking with ItModel Testing: Parameter SweepWhen It Works……And When It Doesn’tThe Wrap-up
doRedisRevoScale R and RevoConnectR (RHadoop)cloudNumbers.com

Content preview from Parallel R

Chapter 3. multicore

multicore is a popular parallel programming package for use on multiprocessor and multicore computers. It was written by Simon Urbanek, and first released on CRAN in 2009. It immediately became popular because its clever use of the fork() system call allows it to implement a parallel lapply() operation that is even easier to use than snow’s parLapply().

Unfortunately, because fork() is a Posix system call, multicore can’t really be used on Windows machines.^[33] Fork() can also cause problems for functions that use resources that were allocated or initialized exclusively for the master, or parent process. This is particularly a problem with graphics functions, so it isn’t generally recommended to use multicore with an R GUI.^[34] Nevertheless, multicore works perfectly for most R functions on Posix systems, such as Linux and Mac OS X, and its use of fork() makes it very efficient and convenient, as we’ll see in this chapter.

Quick Look

Motivation: You have an R script that spends an hour executing a function using lapply() on your laptop.

Solution: Replace lapply() with the mclapply() function from the multicore package.

Good because: It’s easy to install, easy to use, and makes use of hardware that you probably already own.

How It Works

multicore is intended to run on Posix-based multiprocessor and multicore systems. This includes almost all modern Mac OS X and Linux desktop and laptop computers. It can also be used on single nodes of a Linux cluster, for example, but ...