Gini versus Entropy

In order to determine which one of the impurity measures to use, it's important that we cover some foundational knowledge beginning with the concept of information gain.

At it's core, information gain is as it sounds: the gain in information from moving between two states. More accurately, the information gain of a certain event is the difference between the amount of information known before and after the event takes place. One common measure of this information is looking at the Entropy which can be defined as:

Where pj is the frequency of label j at a node.

Now that you are familiar with the concept of information gain ...

Get Mastering Machine Learning with Spark 2.x now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.