May 2019
Intermediate to advanced
664 pages
15h 41m
English
What we should do now is create our training and test data using a 70/30 split. Then, we should subject it to the standard feature exploration we started discussing in Chapter 1, Preparing and Understanding Data, with these tasks in mind:
The first thing then is for us to turn the numeric outcome into a factor to be used for creating a stratified data index, like so:
> y_factor <- as.factor(y)> set.seed(1492)> index <- caret::createDataPartition(y_factor, p = 0.7, list = F)
Using the index, we create train/test input features and labels:
> train <- x[index, ]> train_y <- y_factor[index]> test <- x[-index, ]> test_y ...