O'Reilly logo

Mastering Machine Learning with Spark 2.x by Michal Malohlava, Max Pumperla, Alex Tellez

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Gini versus Entropy

In order to determine which one of the impurity measures to use, it's important that we cover some foundational knowledge beginning with the concept of information gain.

At it's core, information gain is as it sounds: the gain in information from moving between two states. More accurately, the information gain of a certain event is the difference between the amount of information known before and after the event takes place. One common measure of this information is looking at the Entropy which can be defined as:

Where pj is the frequency of label j at a node.

Now that you are familiar with the concept of information gain ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required