Let's begin with the recipe by making some imports and preparing the data:
- Import pandas and the required functions and classes:
import pandas as pdfrom sklearn.model_selection import train_test_splitfrom feature_engine.categorical_encoders import WoERatioCategoricalEncoder
- Let's load the dataset and divide it into train and test sets:
data = pd.read_csv('creditApprovalUCI.csv')X_train, X_test, y_train, y_test = train_test_split( data, data['A16'],test_size=0.3, random_state=0)
- Let's create a pandas Series with the probability of the target being 1, that is, p(1), for each category in A1:
p1 = X_train.groupby(['A1'])['A16'].mean()
- Let's create a pandas Series with the probability of the target being 0, that is,