By-passing missing values
So it seems that missing data relatively frequently occurs with the time-related variables, but we have no missing values among the flight identifiers and dates. On the other hand, if one value is missing for a flight, the chances are rather high that some other variables are missing as well – out of the overall number of 3,622 cases with at least one missing value:
> mean(cor(apply(hflights, 2, function(x) + as.numeric(is.na(x)))), na.rm = TRUE) [1] 0.9589153 Warning message: In cor(apply(hflights, 2, function(x) as.numeric(is.na(x)))) : the standard deviation is zero
Okay, let's see what we have done here! First, we have called the apply
function to transform the values of data.frame
to 0
or 1
, where 0
stands for an ...
Get Mastering Data Analysis with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.