Training GAN models has some major pitfalls:
- The gradient descent algorithm is designed to find the minimum of the loss function, rather than the Nash equilibrium, which is not the same thing. As a result, sometimes the training may fail to converge and could oscillate instead.
- Recall that the discriminator output is a sigmoid function that represents the probability of the example being real or fake. If the discriminator becomes too good at this task, the probability output will converge to either 0 or 1 at every training sample. This would mean that the error gradient will always be 0, which will prevent the generator from learning anything. On the other hand, if the discriminator is bad at recognizing fakes ...