Let's draw a decision tree for the tennis example we created in this recipe:
This model has a depth of three levels. Which attribute to select depends upon how we can maximize information gain. We compute this by measuring the purity of the split. Purity means that irrespective of whether the certainty of playing tennis is increasing, the given dataset will be considered positive or negative. In this example, this equates to whether the chances of play are increasing or the chances of not playing are increasing.
Purity is measured using entropy. Entropy is a measure of disorder in a system. In this context, it is easier to ...