10Supervised Learning

In this chapter, we will discuss supervised learning. What distinguishes supervised learning problems from unsupervised learning problems is that the data come in pairs, i.e. we may say left-parenthesis x Subscript k Baseline comma y Subscript k Baseline right-parenthesis element-of double-struck upper R times double-struck upper R for k element-of double-struck upper N Subscript upper N and we would like to find a relationship between the pairs of data. We will start with linear regression. This does not mean that the data pairs are related to one another in a linear way. Instead, it is the class of functions that we consider that is parameterized in a linear way. First, we will do this in a finite‐dimensional space, and there we will also discuss statistical interpretations and generalizations such as maximum likelihood estimation, maximum a posteriori estimation, and regularization. We will then also do regression in an infinite‐dimensional space, i.e. in a Hilbert space. We will see that this is equivalent to maximum a posteriori estimation for so‐called Gaussian processes. Then we will discuss classification both using linear regression, logistic regression, support vector machines, and the restricted Boltzmann machine. The chapter is finished off with artificial neural networks and the so‐called back‐propagation algorithm. We also discuss a form of implicit regularization known as dropout.

10.1 Linear Regression

We start by considering the problem ...

Get Optimization for Learning and Control now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.