O'Reilly logo

Apache Spark 2.x Machine Learning Cookbook by Shuen Mei, Broderick Hall, Meenakshi Rajendran, Siamak Amirghodsi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Measures of impurity

With all machine learning algorithms, we are trying to minimize a set of cost functions which help us to select the best move. Spark uses three possible selections for maximization functions. The following figure depicts the alternatives:

In this section, we will discuss each of the three possible alternatives:

  • Information gain: Loosely speaking, this measures the level of impurity in a group based on the concept of entropy--see the Shannon information theory and then as later suggested by Quinlan in his ID3 algorithm.

The calculation of entropy is shown in the following equation:

Information gain helps us to select ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required