O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Partitioning into training and test data

Next, we will generate test and training datasets so that we can validate any models produced. There are many ways of generating test and training sets.

In earlier chapters, we used the createDataPartition function. For this example, we will generate the test and training data using native R functions. Please refer to the outline of the code here, and then run the code that follows:

  • Set a variable corresponding to the percentage of the data to designate as training data (TrainingRows). In this example, we will use 75%.
  • Use the sample() function to randomize the rows and assign to a new dataframe named ChurnStudy.
  • Then select the first TrainingRows rows. Since the df dataframe has already been sampled, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required