With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

No credit card required

Entropy and information gain

Before we explain how to create a Decision Tree, we need to introduce two important concepts—entropy and information gain.

Entropy measures the homogeneity of a dataset. Imagine a dataset with 10 observations with one attribute, as shown in the following diagram, the value of this attribute is A for the 10 observations. This dataset is completely homogenous and is easy to predict the value of the next observation, it'll probably be A:

The entropy in a dataset that is completely homogenous is zero. Now, imagine a similar dataset, but in this dataset each observation has a different value, as shown in the following diagram: ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

No credit card required