December 2018
Beginner to intermediate
684 pages
21h 9m
English
We will now train, visualize, and evaluate a classification tree with up to 5 consecutive splits using 80% of the samples for training to predict the remaining 20%. We are taking a shortcut here to simplify the illustration and use the built-in train_test_split, which does not protect against lookahead bias, as our custom iterator. The tree configuration implies up to 25=32 leaf nodes that, on average in the balanced case, would contain over 4,300 of the training samples. Take a look at the following code:
# randomize train-test splitX_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2, random_state=42)# configure & train tree learnerclassifier = DecisionTreeClassifier(criterion ...