Contrastive divergence (CD-k)
Contrastive divergence can be thought of as an approximate maximum-likelihood learning algorithm. It computes the divergence/differences between the positive phase (energy of first encoding) and negative phase (energy of the last encoding). It is equivalent to minimizing the KL-divergence between the model distribution and the (empirical) data distribution. The variable k is the number of times you run contrastive divergence. In practice, k = 1 seems to work surprisingly well.
Basically, the gradients are approximated using the differences between two parts: positive phase associated gradients, and negative phase associated gradients. The positive and negative terms do not reflect its sign of the term but rather ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access