December 2018
Beginner to intermediate
684 pages
21h 9m
English
For a single split of your data into a training and a test set, use sklearn.model_selection.train_test_split, where the shuffle parameter, by default ensures the randomized selection of observations, which in turn can be replicated by setting random_state. There is also a stratify parameter that, for a classification problem, ensures that the train and test sets will contain approximately the same shares of each class, as shown in the following code:
train_test_split(data, train_size=.8)[[8, 7, 4, 10, 1, 3, 5, 2], [6, 9]]
In this case, we train a model using all data except row numbers 6 and 9, which will be used to generate predictions and measure the errors given on the know labels. This method is useful for quick ...