This is the first of two chapters dedicated to Bayesian learning. The main concepts and philosophy behind Bayesian inference are introduced. The evidence function and its relation to Occam’s razor rule are presented. The expectation-maximization (EM) algorithm is derived and applied to linear regression and Gaussian mixture modeling. The k-means algorithm for clustering and its affinity to Gaussian mixture modeling are discussed. Finally, the concept of probabilistic model mixing is reviewed and the notion of mixture of experts is presented.