
MULTICLASS LOGISTIC REGRESSION 39
3.3.2 Algorithms
Although one could tackle this problem with standard convex-optimization
software, we have found coordinate-descent to be particularly effective
(Friedman, Hastie, Simon and Tibshirani 2015). In the two-class case, there
is an outer Newton loop and an inner weighted least-squares step. The outer
loop can be seen as making a quadratic approximation to the log-likelihood,
centered at the current estimates (
˜
β
0k
,
˜
β
k
}
K
k=1
. Here we do the same, except
we hold all but one class’s parameters fixed when making this approxima-
tion. In detail, when updating the parameters (β
0`
, β
`
), we form the quadratic
function ...