June 2017
Beginner to intermediate
576 pages
15h 22m
English
If we were interested in comparing blood pressure values with the average age, we could then construct a query to calculate some mean and standard deviations on the larger Spark table (out_tbl), and then group the results by outcome. The output also indicates that diabetics are older:
bin_agg <- SparkR::sql("SELECT outcome, mean(pressure) as mean_pressure,std(pressure) as std_pressure, mean(age) as mean_age,std(age) as std_age from out_tbl group by 1") #register the tableSparkR:::registerTempTable(bin_agg,"bin_agg") #print a few recordshead(bin_agg)
