Chapter 7.  Extending Spark with SparkR

Statisticians and data scientists have been using R to solve challenging problems in almost every field, ranging from bioinformatics to election campaigns. They prefer R due to its powerful visualization capabilities, strong community, and rich package ecosystem for statistics and machine learning. Many academic institutions around the world teach data science and statistics using the R language.

R was originally created by and for statisticians in around the mid-1990s with a goal to deliver a better and more user-friendly way to perform data analysis. R was initially used in academics and research. As businesses became increasingly aware of the role of data science in their business growth, the number of ...

Get Spark for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.