To proceed with the recipe, let's import the required tools and prepare the dataset:
- Import pandas and the required functions and classes from scikit-learn and Feature-engine:
import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.impute import SimpleImputerfrom feature_engine.missing_data_imputers import CategoricalVariableImputer
- Let's load the dataset:
data = pd.read_csv('creditApprovalUCI.csv')
- Let's separate the data into train and test sets:
X_train, X_test, y_train, y_test = train_test_split( data.drop('A16', axis=1), data['A16'], test_size=0.3, random_state=0)
- Let's replace missing values in four categorical variables by using the Missing string:
for var in ['A4', 'A5', 'A6', 'A7']: ...