Preface
In a world where information is growing exponentially, leading tools like Apache Spark provide support to solve many of the relevant problems we face today. From companies looking for ways to improve based on data-driven decisions, to research organizations solving problems in health care, finance, education, and energy, Spark enables analyzing much more information faster and more reliably than ever before.
Various books have been written for learning Apache Spark; for instance, Spark: The Definitive Guide is a comprehensive resource, and Learning Spark is an introductory book meant to help users get up and running (both are from O’Reilly). However, as of this writing, there is neither a book to learn Apache Spark using the R computing language nor a book specifically designed for the R user or the aspiring R user.
There are some resources online to learn Apache Spark with R, most notably the spark.rstudio.com site and the Spark documentation site at spark.apache.org. Both sites are great online resources; however, the content is not intended to be read from start to finish and assumes you, the reader, have some knowledge of Apache Spark, R, and cluster computing.
The goal of this book is to help anyone get started with Apache Spark using R. Additionally, because the R programming language was created to simplify data analysis, it is also our belief that this book provides the easiest path for you to learn the tools used to solve data analysis problems with Spark. The ...