If we have a probability space S and two events A and B, the probability of A given B is called conditional probability, and it's defined as:
As P(A, B) = P(B, A), it's possible to derive Bayes' theorem:
This theorem allows expressing a conditional probability as a function of the opposite one and the two marginal probabilities P(A) and P(B). This result is fundamental to many machine learning problems, because, as we're going to see in this and in the next chapters, normally it's easier to work ...