July 2017
Intermediate to advanced
382 pages
9h 13m
English
A better sense of a model's performance can be found using what's known as a test set, but you already knew this. When presented with data held out from the training procedure, we can check whether a model has learned some dependencies in the data that hold across the board or whether it just memorized the training set.
We can split the data into training and test sets using the familiar train_test_split from scikit-learn's model_selection module:
In [6]: from sklearn.model_selection import train_test_split
But how do we choose the right train-test ratio? Is there even such a thing as a right ratio? Or is this considered another hyperparameter of the model?
There are two competing concerns here:
Read now
Unlock full access