## 11.9 Kernel Ridge Regression Revisited

The kernel ridge regression was introduced in Section 11.7. Here, it will be restated via its dual representation form. The ridge regression in its primal representation can be cast as

$\begin{array}{ll}\hfill \text{minimize with respect to}\mathbit{\theta },\mathbit{\xi }& J\left(\mathbit{\theta },\mathbit{\xi }\right)=\sum _{n=1}^{N}{\xi }_{n}^{2}+C\parallel \mathbit{\theta }{\parallel }^{2},\hfill \\ \hfill \text{subject to}& {y}_{n}-{\mathbit{\theta }}^{T}{\mathbit{x}}_{n}={\xi }_{n},n=1,2,\dots ,N,\hfill \end{array}$

(11.51)

which leads to the following Lagrangian:

$\begin{array}{l}\hfill L\left(\mathbit{\theta },\mathbit{\xi },\mathbit{\lambda }\right)=\sum _{n=1}^{N}{\xi }_{n}^{2}+C\parallel \mathbit{\theta }{\parallel }^{2}+\sum _{n=1}^{N}{\lambda }_{n}\left({y}_{n}-{\mathbit{\theta }}^{\text{T}}{\mathbit{x}}_{n}-{\xi }_{n}\right),n=1,2,\dots ,N.\end{array}$

(11.52)

Differentiating with respect to θ and ξn, n = 1,2,…,N, and equating to zero, we obtain

$\begin{array}{l}\hfill \mathbit{\theta }=\frac{1}{2C}\sum _{n=1}^{N}{\lambda }_{n}{\mathbit{x}}_{n}\end{array}$

(11.53)

and

$\begin{array}{l}\hfill {\xi }_{n}\end{array}$

Get Machine Learning now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.