February 2019
Intermediate to advanced
386 pages
9h 54m
English
Given a probability distribution p(x), the set Dp = {x : p(x) > 0} is called support. If two distributions, p(x) and q(x), have disjointed supports (that is, Dp ∩ Dq = {∅}), the Jensen-Shannon divergence becomes equal to log(2). This means that the gradient is null, and no corrections can happen anymore. In a generic scenario where a GAN is involved, it's extremely unlikely that pg(x) and pdata are fully overlapped (however, you can expect a minimum overlap); therefore, the gradients are very small, and so are the updates to the weights. Such a problem could block the training process and trap the GAN in a suboptimal configuration that cannot be escaped. For this reason, Arjovsky, Chintala, and Bottou (in Wasserstein GAN, ...