Information gain criterion is based on the Shannon entropy notion. The Shannon entropy is a very important topic in the information theory, physics, and other domains. Mathematically, it is expressed as:

Where *i* is a state of a system, *N* is a total number of possible states, and *p _{i}* is a probability of the system being in the state

*i*. Entropy describes the amount of uncertainty in the system. The more order you have in the system, the less entropy there is.

For the visual introduction to the information theory, check

*Visual Information Theory*by Christopher Olah at: http://colah.github.io/posts/2015-09-Visual-Information/ ...