May 2017
Beginner to intermediate
254 pages
6h 24m
English
With a solid understanding of partitioning evaluation metrics, let's practice the CART tree algorithm by hand on a simulated dataset:

To begin, we decide on the first splitting point, the root, by trying out all possible values for each of two features. We utilize the weighted_impurity function we just defined to calculate the weighted Gini impurity for each possible combination:
Gini(interest, Tech) = weighted_impurity([[1, 1, 0], [0, 0, 0, 1]]) = 0.405
Gini(interest, Fashion) = weighted_impurity([[0, 0], [1, 0, 1, 0, 1]]) = 0.343
Gini(interest, Sports) = weighted_impurity([[0, 1], [1, 0, 0, 1, 0]]) = ...