August 2018
Intermediate to advanced
522 pages
12h 45m
English
The Gini impurity index is defined as follows:

Here, the sum is always extended to all classes. This is a very common measure and it's used as a default value by scikit-learn. Given a sample, the Gini impurity measures the probability of a misclassification if a label is randomly chosen using the probability distribution of the branch. The index reaches its minimum (0.0) when all the samples of a node are classified into a single category.
Read now
Unlock full access