How it works...
In this recipe, we replaced the missing values in numerical variables with a value at the end of the distribution using pandas and Feature-engine. These values were calculated using the IQR proximity rule or the mean and standard deviation. First, we loaded the data and divided it into train and test sets using train_test_split(), as described in the Performing mean or median imputation recipe.
To impute missing data with pandas, we calculated the values at the end of the distributions using the IQR proximity rule or the mean and standard deviation according to the formulas we described in the introduction to this recipe. We determined the quantiles using pandas quantile() and the mean and standard deviation using pandas ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access