July 2017
Intermediate to advanced
360 pages
8h 26m
English
The Gini impurity index is defined as:

Here, the sum is always extended to all classes. This is a very common measure and it's used as a default value by scikit-learn. Given a sample, the Gini impurity measures the probability of a misclassification if a label is randomly chosen using the probability distribution of the branch. The index reaches its minimum (0.0) when all the samples of a node are classified into a single category.
Read now
Unlock full access