9Unsupervised Learning

We are now going to discuss unsupervised learning. This is about finding lower‐dimensional descriptions of a set of data StartSet x 1 comma ellipsis comma x Subscript upper N Baseline EndSet. One simple such lower‐dimensional description is the mean of the data. Another one could be to find a probability function from which the data are the outcome. We will see that there are many more lower‐dimensional descriptions of data. We will start the chapter by defining entropy, and we will see that many of the probability density functions that are of interest in learning can be derived from the so‐called “maximum entropy principle.” Specifically, we will derive the categorical distribution, the Ising distribution, and the normal distribution. There is a close relationship between the Lagrange dual function of the maximum entropy problem and maximum likelihood (ML) estimation, which will also be investigated. Other topics that we cover are prediction, graphical models, cross entropy, the expectation maximization algorithm, the Boltzmann machine, principal component analysis, mutual information, and cluster analysis. As a prelude to entropy we will start by discussing the so‐called Chebyshev bounds.

9.1 Chebyshev Bounds

Consider a probability space left-parenthesis normal upper Omega comma script upper F comma double-struck upper P right-parenthesis and a random variable . In this section, we will bound for some set using the ...

Get Optimization for Learning and Control now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.