Book description
It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets, including three chapters on using R and Hadoop together. You’ll learn the basics of Snow, Multicore, Parallel, Segue, RHIPE, and Hadoop Streaming, including how to find them, how to use them, when they work well, and when they don’t.
With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier.
- Snow: works well in a traditional cluster environment
- Multicore: popular for multiprocessor and multicore computers
- Parallel: part of the upcoming R 2.14.0 release
- R+Hadoop: provides low-level access to a popular form of cluster computing
- RHIPE: uses Hadoop’s power with R’s language and interactive shell
- Segue: lets you use Elastic MapReduce as a backend for lapply-style operations
Publisher resources
Table of contents
- Parallel R
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- A Note Regarding Supplemental Files
- Preface
- 1. Getting Started
-
2. snow
- Quick Look
- How It Works
- Setting Up
-
Working with It
- Creating Clusters with makeCluster
- Parallel K-Means
- Initializing Workers
- Load Balancing with clusterApplyLB
- Task Chunking with parLapply
- Vectorizing with clusterSplit
- Load Balancing Redux
- Functions and Environments
- Random Number Generation
- snow Configuration
- Installing Rmpi
- Executing snow Programs on a Cluster with Rmpi
- Executing snow Programs with a Batch Queueing System
- Troubleshooting snow Programs
- When It Works…
- …And When It Doesn’t
- The Wrap-up
- 3. multicore
- 4. parallel
- 5. A Primer on MapReduce and Hadoop
- 6. R+Hadoop
- 7. RHIPE
- 8. Segue
- 9. New and Upcoming
- About the Authors
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- Copyright
Product information
- Title: Parallel R
- Author(s):
- Release date: October 2011
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781449320331
You might also like
book
Modern Software Engineering: Doing What Works to Build Better Software Faster
Improve Your Creativity, Effectiveness, and Ultimately, Your Code In Modern Software Engineering, continuous delivery pioneer David …
audiobook
Fall in Love with the Problem, Not the Solution
Unicorns-companies that reach a valuation of more than $1 billion-are rare. Uri Levine has built two. …
book
MongoDB: The Definitive Guide, 3rd Edition
Manage your data with a system designed to support modern application development. Updated for MongoDB 4.2, …
book
Refactoring: Improving the Design of Existing Code
Fully Revised and Updated–Includes New Refactorings and Code Examples “Any fool can write code that a …