Now, I would like to briefly talk about the Kullback-Leibler (KL) divergence, or just KL divergence. This is a concept that you may found when reading about statistics, machine learning, information theory, or statistical mechanics. You may argue that the reason for the recurrence of the KL-divergence as well as other concepts like entropy or the marginal likelihood is simply that, at least partially, all of these disciplines are discussing the same sets of problems, just under slightly different perspectives.
The KL divergence is useful because it is a way of measuring how close two distributions are, and is defined as follows:
This reads as the Kullback-Leibler divergence from to (yes, you have to read it ...