Before going ahead, let's get familiar with the notations:
- Let's represent the distribution of the input dataset by , where represents the parameter of the network that will be learned during training
- We represent the latent variable by , which encodes all the properties of the input by sampling from the distribution
- denotes ...