June 2017
Beginner to intermediate
576 pages
15h 22m
English
Correlations and covariances can also be computed directly from a Spark dataframe. For our example, we can see that there is a larger correlation between age and glucose for non-diabetic patients:
First the diabetic outcomes. Correlation is .113 :

Now, the non-diabetic outcomes. Correlation is .22:

For the entire population the correlation is 0.26:
