Book description
If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.
Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.
- Analyze, explore, transform, and visualize data in Apache Spark with R
- Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows
- Perform analysis and modeling across many machines using distributed computing techniques
- Use large-scale data from multiple sources and different formats with ease from within Spark
- Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale
- Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions
Publisher resources
Table of contents
- Foreword
- Preface
- 1. Introduction
- 2. Getting Started
- 3. Analysis
- 4. Modeling
- 5. Pipelines
- 6. Clusters
- 7. Connections
- 8. Data
- 9. Tuning
- 10. Extensions
- 11. Distributed R
- 12. Streaming
- 13. Contributing
- A. Supplemental Code References
- Index
Product information
- Title: Mastering Spark with R
- Author(s):
- Release date: October 2019
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492046370
You might also like
book
Data Algorithms with Spark
Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this …
book
R for Data Science
Learn how to use R to turn raw data into insight, knowledge, and understanding. This book …
book
Scaling Machine Learning with Spark
Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, …
book
Tidy Modeling with R
Get going with tidymodels, a collection of R packages for modeling and machine learning. Whether you're …