How it works...
In this recipe, we replaced missing values in numerical variables in the Credit Approval Data Set with an arbitrary number, 99, using pandas, scikit-learn, and Feature-engine. We loaded the data and divided it into train and test sets using train_test_split() from scikit-learn, as described in the Performing mean or median imputation recipe.
To determine which arbitrary value to use, we inspected the maximum values of four numerical variables using the pandas max() method. Next, we chose a value, 99, that was bigger than the maximum values of the selected variables. In step 5, we used a for loop over the numerical variables to replace any missing data with the pandas fillna() method while passing 99 as an argument and setting ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access