SparkR DataFrames support a number of operations to do structured data processing. In this recipe, we'll see a good number of examples, such as selection, grouping, aggregation, and so on.
To step through this recipe, you will need a running Spark Cluster either in pseudo distributed mode or in one of the distributed modes, that is, standalone, YARN, or Mesos. Also, install RStudio. Please refer to the Installing R recipe for details on the installation of R and the Creating SparkR DataFrames recipe to get acquainted with the creation of DataFrames from a variety of data sources.
In this recipe, we'll see how to perform various operations SparkR data frames: