
Ordinary linear regression 75
where we have assumed σ
2
= 1 for simplicity, the parameter λ defines a trade-
off between minimum residual error and minimum norm. Equating the vector
derivative with respect to w with zero as before then leads to the normal
equation
X
⊤
y = (X
⊤
X + λI
N+1
)w, (2.102)
where the identity matrix I
N+1
has dimensio ns (N + 1) ×(N + 1). The least
squares estimate for the parameter vector w is now
ˆ
w = (X
⊤
X + λI
N+1
)
−1
X
⊤
y. (2.103)
For λ > 0, the matrix X
⊤
X + λI
N+1
can a lways be inverted. This proce-
dure is known as ridge regression and Equation (2.103) may be referred to as
its primal solution. Regularization will be encountered in Chapter ...