9Unsupervised Learning
We are now going to discuss unsupervised learning. This is about finding lower‐dimensional descriptions of a set of data . One simple such lower‐dimensional description is the mean of the data. Another one could be to find a probability function from which the data are the outcome. We will see that there are many more lower‐dimensional descriptions of data. We will start the chapter by defining entropy, and we will see that many of the probability density functions that are of interest in learning can be derived from the so‐called “maximum entropy principle.” Specifically, we will derive the categorical distribution, the Ising distribution, and the normal distribution. There is a close relationship between the Lagrange dual function of the maximum entropy problem and maximum likelihood (ML) estimation, which will also be investigated. Other topics that we cover are prediction, graphical models, cross entropy, the expectation maximization algorithm, the Boltzmann machine, principal component analysis, mutual information, and cluster analysis. As a prelude to entropy we will start by discussing the so‐called Chebyshev bounds.
9.1 Chebyshev Bounds
Consider a probability space and a random variable . In this section, we will bound for some set using the ...
Get Optimization for Learning and Control now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.