February 2018
Intermediate to advanced
378 pages
10h 14m
English
Finally, we want to split our data into training and test sets. We will train our classifier only on the training set, so it will never see the test set until we want to evaluate its performance. This is a very important step, because as we will see in the future, the quality of predictions on the test set can differ dramatically from the quality measured on the training set. Data splitting is an operation specific to machine learning tasks, so we will import scikit-learn (a machine learning package) and use some functions from it:
In []: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3, random_state=42) X_train.shape, y_train.shape, ...
Read now
Unlock full access