Skip to main content

Get full access to Mastering Spark with R and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Mastering Spark with R

Mastering Spark with R

by Javier Luraschi, Kevin Kuo, Edgar Ruiz

Released October 2019

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781492046370

Buy on Amazon Buy on ebooks.com

Start your free trial

Book description

If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.

Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users.

Analyze, explore, transform, and visualize data in Apache Spark with R
Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows
Perform analysis and modeling across many machines using distributed computing techniques
Use large-scale data from multiple sources and different formats with ease from within Spark
Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale
Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions

Publisher resources

View/Submit Errata

Table of contents

Foreword
Preface
1. Introduction
1. Overview
2. Hadoop
3. Spark
4. R
5. sparklyr
6. Recap
2. Getting Started
1. Overview
2. Prerequisites
  1. Installing sparklyr
  2. Installing Spark
3. Connecting
4. Using Spark
  1. Web Interface
  2. Analysis
  3. Modeling
  4. Data
  5. Extensions
  6. Distributed R
  7. Streaming
  8. Logs
5. Disconnecting
6. Using RStudio
7. Resources
8. Recap
3. Analysis
1. Overview
2. Import
3. Wrangle
  1. Built-in Functions
  2. Correlations
4. Visualize
  1. Using ggplot2
  2. Using dbplot
5. Model
  1. Caching
6. Communicate
7. Recap
4. Modeling
5. Pipelines
6. Clusters
1. Overview
2. On-Premises
  1. Managers
  2. Distributions
3. Cloud
  1. Amazon
  2. Databricks
  3. Google
  4. IBM
  5. Microsoft
  6. Qubole
4. Kubernetes
5. Tools
  1. RStudio
  2. Jupyter
  3. Livy
6. Recap
7. Connections
1. Overview
  1. Edge Nodes
  2. Spark Home
2. Local
3. Standalone
4. YARN
  1. YARN Client
  2. YARN Cluster
5. Livy
6. Mesos
7. Kubernetes
8. Cloud
9. Batches
10. Tools
11. Multiple Connections
12. Troubleshooting
13. Recap
8. Data
1. Overview
2. Reading Data
  1. Paths
  2. Schema
  3. Memory
  4. Columns
3. Writing Data
4. Copying Data
5. File Formats
  1. CSV
  2. JSON
  3. Parquet
  4. Others
6. File Systems
7. Storage Systems
  1. Hive
  2. Cassandra
  3. JDBC
8. Recap
9. Tuning
10. Extensions
1. Overview
2. H2O
3. Graphs
4. XGBoost
5. Deep Learning
6. Genomics
7. Spatial
8. Troubleshooting
9. Recap
11. Distributed R
12. Streaming
1. Overview
2. Transformations
3. Kafka
4. Shiny
5. Recap
13. Contributing
A. Supplemental Code References
Index

Product information

Title: Mastering Spark with R
Author(s): Javier Luraschi, Kevin Kuo, Edgar Ruiz
Release date: October 2019
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781492046370

You might also like

book

Data Algorithms with Spark

by Mahmoud Parsian

Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this …

book

R for Data Science

by Hadley Wickham, Garrett Grolemund

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book …

book

Scaling Machine Learning with Spark

by Adi Polak

Learn how to build end-to-end scalable machine learning solutions with Apache Spark. With this practical guide, …

book

Tidy Modeling with R

by Max Kuhn, Julia Silge

Get going with tidymodels, a collection of R packages for modeling and machine learning. Whether you're …

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now