June 2017
Beginner to intermediate
576 pages
15h 22m
English
Now we can join the aggregate means and standard deviations together using a crossJoin. No key is necessary since the tables have the same number of rows and they correspond 1 to 1.
Then join the summary data with the original data. The resulting object is another Spark dataframe (df_means) with the means and standard deviations of all of the variables joined to each row:
#join the means and standard deviationsboth <- crossJoin(means,stds) #join with the original datadf_means <- crossJoin(df,both) nrow(df_means)

Print a portion of the resultant object. All of the original variables ...