Creating random data partitions

Analysts need an unbiased evaluation of the quality of their machine learning models. To get this, they partition the available data into two parts. They use one part to build the machine learning model and retain the remaining data as "hold out" data. After building the model, they evaluate the model's performance on the hold out data. This recipe shows you how to partition data. It separately addresses the situation when the target variable is numeric and when it is categorical. It also covers the process of creating two partitions or three.

Getting ready

If you have not already done so, make sure that the BostonHousing.csv and boston-housing-classification.csv files from the code files of this chapter are in your ...

Get R: Recipes for Analysis, Visualization and Machine Learning now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.