O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Joining the means and standard deviations with the training data

Now we can join the aggregate means and standard deviations together using a crossJoin. No key is necessary since the tables have the same number of rows and they correspond 1 to 1.

Then join the summary data with the original data. The resulting object is another Spark dataframe (df_means) with the means and standard deviations of all of the variables joined to each row:

#join the means and standard deviationsboth <- crossJoin(means,stds) #join with the original datadf_means <- crossJoin(df,both) nrow(df_means) 

Print a portion of the resultant object. All of the original variables ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required