O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Which correlations to use?

For the covariance matrix, we can either use separate matrices for the two diabetes outcomes (1,0), or use a pooled covariance matrix, which shows the correlations among the variables regardless of the outcome.

We will use the separate correlation or covariance matrices since we have enough observations for each outcome (n=500 and n=268). If either of these classes were much smaller related to the other, we could use the pooled (or total) covariance matrix instead, since that would cover a larger set of observations.

Some notes on the code which follows:

  • As a reminder, always start with a random seed prior to a simulation. That will ensure that you get the same random results every time you run the code.
  • The

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required