Chapter 7.  Extending Spark with SparkR

Statisticians and data scientists have been using R to solve challenging problems in almost every field, ranging from bioinformatics to election campaigns. They prefer R due to its powerful visualization capabilities, strong community, and rich package ecosystem for statistics and machine learning. Many academic institutions around the world teach data science and statistics using the R language.

R was originally created by and for statisticians in around the mid-1990s with a goal to deliver a better and more user-friendly way to perform data analysis. R was initially used in academics and research. As businesses became increasingly aware of the role of data science in their business growth, the number of ...

Get Spark for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.