June 2020
Intermediate to advanced
382 pages
11h 39m
English
Now, let's divide the training dataset into 25% testing and 75% training portions using sklearn.model_selection import train_test_split:
#from sklearn.cross_validation import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
This has created the following four data structures:
X_train: A data structure containing the features of the training data
X_test: A data structure containing the features of the training test
y_train: A vector containing the values of the label in the training dataset
y_test: A vector containing the values of the label in the testing dataset