O'Reilly logo

Data Analysis with R - Second Edition by Tony Fischetti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mean substitution

Mean substitution, as the name suggests, replaces all the missing values with the mean of the available cases. Take a look at the following code, for example:

mean_sub <- miss_mtcars 
mean_sub$qsec[is.na(mean_sub$qsec)] <- mean(mean_sub$qsec, 
                                            na.rm=TRUE) 
# etc... 

Although this seemingly solves the problem of the loss of sample size in the list-wise deletion procedure, mean substitution has some very unsavory properties of its own. Whilst mean substitution produces unbiased estimates of the mean of a column, it produces biased estimates of the variance, since it removes the natural variability that would have occurred in the missing values had they not been missing. The variance estimates from mean substitution will therefore ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required