Chapter 4

EM Optimization Methods

The expectation–maximization (EM) algorithm is an iterative optimization strategy motivated by a notion of missingness and by consideration of the conditional distribution of what is missing given what is observed. The strategy's statistical foundations and effectiveness in a variety of statistical problems were shown in a seminal paper by Dempster, Laird, and Rubin [150]. Other references on EM and related methods include [409, 413, 449, 456, 625]. The popularity of the EM algorithm stems from how simple it can be to implement and how reliably it can find the global optimum through stable, uphill steps.

In a frequentist setting, we may conceive of observed data generated from random variables X along with missing or unobserved data from random variables Z. We envision complete data generated from Y = (X, Z). Given observed data x, we wish to maximize a likelihood L(θ|x). Often it will be difficult to work with this likelihood and easier to work with the densities of Y|θ and Z|(x, θ). The EM algorithm sidesteps direct consideration of L(θ|x) by working with these easier densities.

In a Bayesian application, interest often focuses on estimating the mode of a posterior distribution f(θ|x). Again, optimization can sometimes be simplified by consideration of unobserved random variables ψ in addition to the parameters of interest, θ.

The missing data may not truly be missing: They may be only a conceptual ploy that simplifies the problem. In this case, ...

Get Computational Statistics, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.