Chapter 2

Optimization and Solving Nonlinear Equations

Maximum likelihood estimation is central to statistical inference. Long hours can be invested in learning about the theoretical performance of MLEs and their analytic derivation. Faced with a complex likelihood lacking analytic solution, however, many people are unsure how to proceed.

Most functions cannot be optimized analytically. For example, maximizing g(x) = (log x)/(1 + x) with respect to x by setting the derivative equal to zero and solving for x leads to an algebraic impasse because 1 + 1/x − log x = 0 has no analytic solution. Many realistic statistical models induce likelihoods that cannot be optimized analytically—indeed, we would argue that greater realism is strongly associated with reduced ability to find optima analytically.

Statisticians face other optimization tasks, too, aside from maximum likelihood. Minimizing risk in a Bayesian decision problem, solving nonlinear least squares problems, finding highest posterior density intervals for many distributions, and a wide variety of other tasks all involve optimization. Such diverse tasks are all versions of the following generic problem: Optimize a real-valued function g with respect to its argument, a p-dimensional vector x. In this chapter, we will limit consideration to g that are smooth and differentiable with respect to x; in Chapter 3 we discuss optimization when g is defined over a discrete domain. There is no meaningful distinction between maximization ...