8Neuromanifolds and the Uniqueness of Relative Entropy
8.1 Overview
This chapter goes into theoretical detail to show modern arguments for the choice of relative entropy as difference measure on distributions. Also shown is a fundamentally derived variant of Expectation Maximization (EM), referred to as “em.” Following the derivation of Amari [113–115], geometric representations are considered for two types of statistical information: families of probability distributions and families of Neural Nets. Emphasis is placed on the dually flat formulation of “information geometry” by Amari, but the development is such that other formulations might be considered as well.
Why establish a geometric formulation? Here are some motivations:
- to obtain a concise representation of the elements of families of probability distributions,
- if a dynamical description is eventually sought, such as in adaptive algorithms that “learn,” or optimize, then an established kinematical representation can offer much needed clarity,
- if a variational formalism is provided, the spatial and “temporal” (inertial) kinematical structures can be clearly expressed, separate from the unknown dynamical structures.
Once the geometric formulation is established, we stand to gain substantially by systematic exploration of various algorithms, permitting the more efficient algorithms to be isolated. In part, the search for a more efficient algorithm is then simpler in that it may focus effort on the dynamical element ...
Get Informatics and Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.