11.9 Kernel Ridge Regression Revisited

The kernel ridge regression was introduced in Section 11.7. Here, it will be restated via its dual representation form. The ridge regression in its primal representation can be cast as

$\begin{array}{ll}\hfill \text{minimize with respect to}\mathbit{\theta },\mathbit{\xi }& J\left(\mathbit{\theta },\mathbit{\xi }\right)=\sum _{n=1}^{N}{\xi }_{n}^{2}+C\parallel \mathbit{\theta }{\parallel }^{2},\hfill \\ \hfill \text{subject to}& {y}_{n}-{\mathbit{\theta }}^{T}{\mathbit{x}}_{n}={\xi }_{n},n=1,2,\dots ,N,\hfill \end{array}$

(11.51)

which leads to the following Lagrangian:

$\begin{array}{l}\hfill L\left(\mathbit{\theta },\mathbit{\xi },\mathbit{\lambda }\right)=\sum _{n=1}^{N}{\xi }_{n}^{2}+C\parallel \mathbit{\theta }{\parallel }^{2}+\sum _{n=1}^{N}{\lambda }_{n}\left({y}_{n}-{\mathbit{\theta }}^{\text{T}}{\mathbit{x}}_{n}-{\xi }_{n}\right),n=1,2,\dots ,N.\end{array}$

(11.52)

Differentiating with respect to θ and ξn, n = 1,2,…,N, and equating to zero, we obtain

$\begin{array}{l}\hfill \mathbit{\theta }=\frac{1}{2C}\sum _{n=1}^{N}{\lambda }_{n}{\mathbit{x}}_{n}\end{array}$

(11.53)

and

$\begin{array}{l}\hfill {\xi }_{n}\end{array}$

Get Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.