Apache Spark is a fast, general-purpose, fault-tolerant framework for interactive and iterative computations on large, distributed datasets. It supports a wide variety of data sources as well as storage layers.
It provides unified data access to combine different data formats, streaming data, and defining complex operations using high-level, composable operators. You can develop your applications interactively using Scala, Python, or R shell. In this recipe, you will learn various way of interacting with Apache Spark through R for data handling, along with predictive analysis on datasets.