- We split our target and feature variables:
from sklearn.model_selection import train_test_splitX = df_creditdata.iloc[:,0:23]Y = df_creditdata['default.payment.next.month']
- Split the data into training, validation, and testing subsets:
# We first split the dataset into train and test subsetX_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, random_state=1)# Then we take the train subset and carve out a validation set from the sameX_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size=0.2, random_state=1)
- Check the dimensions of each subset to ensure that our splits are correct:
# Dimensions for train subsetsprint(X_train.shape)print(Y_train.shape)# Dimensions for validation ...