In this section, we try out some useful, commonly used operations. First, we try out the traditional R/
dplyr operations and then show equivalent operations using the SparkR API:
> //Open the R shell and NOT SparkR shell > library(dplyr,warn.conflicts=FALSE) //Load dplyr first //Perform a common, useful operation > iris %>% + group_by(Species) %>% + summarise(avg_length = mean(Sepal.Length), + avg_width = mean(Sepal.Width)) %>% + arrange(desc(avg_length)) Source: local data frame [3 x 3] Species avg_length avg_width (fctr) (dbl) (dbl) 1 virginica 6.588 2.974 2 versicolor 5.936 2.770 3 setosa 5.006 3.428 //Remove from R environment > detach("package:dplyr",unload=TRUE)
This operation is very similar to the SQL group and is followed ...