10Expectation Maximization

10.1 Introduction

In machine learning and statistics, computing the maximum likelihood (ML) or maximum a posteriori (MAP) parameter estimate relies on the availability of complete data. However, if the model consists of latent variables or there is missing data, then ML and MAP estimation will become a challenging problem. In such cases, gradient descent methods can be used to find a local minimum of the negative log likelihood (NLL) [77]:

(10.1)NLL left-parenthesis bold-italic theta right-parenthesis equals minus StartFraction 1 Over upper N plus 1 EndFraction log p Subscript bold-italic theta Baseline left-parenthesis script upper D right-parenthesis comma

where bold-italic theta represents the set of model parameters and script upper D denotes the set of upper N plus 1 observed data points for k equals 0 comma ellipsis comma upper N. It is often required to impose additional constraints on the model such as mixing weights must be normalized and covariance matrices must be positive definite. Expectation maximization (EM) algorithm paves the way for addressing this issue. As an iterative algorithm, EM enforces the required constraints and handles the missing data problem by alternating between two steps [77]:

  • E‐step: inferring the missing values given ...

Get Nonlinear Filters now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.