June 2017
Beginner to intermediate
576 pages
15h 22m
English
There are a couple more transformations that we can perform on our hospital dataset.
We would like to standardize both the Total.Costs variable and the Total.Charges variable so that we can compare them more easily. Predictive models like to see variables standardized because there is less bias towards any particular variable, since all standardized variables have approximately the same magnitude.
The way we do this is by normalizing the means to 0, with a standard deviation of 1. Computational means that we will take each value, subtract the mean and divide the results by the standard deviation.
In previous examples, we have explicitly referred to variables using their full name (for example, df$Total.Costs, which refers ...