Splitting made easy
Here, we have a simple snippet that demonstrates how we can use the scikit-learn library to split our data into training and test sets. We're loading the data in from the datasets module and passing both X and y into the split function. We should be familiar with loading the data up. We have the train_test_split function from the model_selection submodule in sklearn. This is going to take any number of arrays. So, 20% is going to be test_size, and the remaining 80% of that data will be training. We define random_state, so that our split can be reproducible if we ever have to prove exactly how we got this split. There's also the stratify keyword, which we're not using here, which can be used to stratify a split for rare ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access