Skip to Content
Mastering Spark with R
book

Mastering Spark with R

by Javier Luraschi, Kevin Kuo, Edgar Ruiz
October 2019
Beginner to intermediate
293 pages
6h 55m
English
O'Reilly Media, Inc.
Content preview from Mastering Spark with R

Foreword

Apache Spark is a distributed computing platform built on extensibility: Spark’s APIs make it easy to combine input from many data sources and process it using diverse programming languages and algorithms to build a data application. R is one of the most powerful languages for data science and statistics, so it makes a lot of sense to connect R to Spark. Fortunately, R’s rich language features enable simple APIs for calling Spark from R that look similar to running R on local data sources. With a bit of background about both systems, you will be able to invoke massive computations in Spark or run your R code in parallel from the comfort of your favorite R programming environment.

This book explores using Spark from R in detail, focusing on the sparklyr package that enables support for dplyr and other packages known to the R community. It covers all of the main use cases in detail, ranging from querying data using the Spark engine to exploratory data analysis, machine learning, parallel execution of R code, and streaming. It also has a self-contained introduction to running Spark and monitoring job execution. The authors are exactly the right people to write about this topic—Javier, Kevin, and Edgar have been involved in sparklyr development since the project started. I was excited to see how well they’ve assembled this clear and focused guide about using Spark with R.

I hope that you enjoy this book and use it to scale up your R workloads and connect them to the capabilities ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Advanced Machine Learning with R

Advanced Machine Learning with R

Cory Lesmeister, Dr. Sunil Kumar Chinnamgari
Advanced R

Advanced R

Hadley Wickham
Regression Analysis with R

Regression Analysis with R

Giuseppe Ciaburro, Pierre Paquay, Manoj Kumar, Shaikh Salamatullah

Publisher Resources

ISBN: 9781492046363Errata Page