How it works...
We replaced the missing values in the Credit Approval Data Set with the median estimates of the variables using pandas, scikit-learn, and Feature-engine. Since the mean or median values should be learned from the train set variables, we divided the dataset into train and test sets. To do so, in step 3, we used scikit-learn's train_test_split() function, which takes the dataset with predictor variables, the target, the percentage of observations to retain in the test set, and a random_state value for reproducibility as arguments. To obtain a dataset with predictor variables only, we used pandas drop() with the target variable A16 as an argument. To obtain the target, we sliced the dataframe on the target column, A16. By doing ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access