Getting ready...
Now that we have the basic architecture of VAEs, the question arises how can they be trained, since the maximum likelihood of the training data and posterior density are intractable? The network is trained by maximizing the lower bound of the log data likelihood. Thus, the loss term consists of two components: generation loss, which is obtained from the decoder network through sampling, and the KL divergence term, also called the latent loss.
Generation loss ensures that the image generated by the decoder and the image used to train the network is the same, and latent loss ensures that the posterior distribution qᵩ(z|x) is close to the prior pϴ(z). Since the encoder uses Gaussian distribution for sampling, the latent loss ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access