The customer satisfaction data was covered in Chapter 3, Logistic Regression. The GitHub links to the CSV and an RData file are as follows:
- https://github.com/PacktPublishing/Advanced-Machine-Learning-with-R/blob/master/Data/santander_prepd.RData
- https://github.com/PacktPublishing/Advanced-Machine-Learning-with-R/blob/master/Data/santander_prepd.csv
I'll show you how to load the RData file:
> santander <- readRDS("santander_prepd.RData")
The data has an unbalanced response:
> table(santander$y) 0 1 73012 3008
We'll split the train and test sets using the same random seed as in Chapter 3, Logistic Regression:
> set.seed(1966)> trainIndex <- caret::createDataPartition(santander$y, p = 0.8, list = FALSE)> train <- santander[trainIndex, ...