Weight clipping can lead to problems with gradient stability. Instead, researchers suggests adding a gradient penalty to the critic’s loss function, which indirectly tries to constrain the original critic’s gradient to have a norm close to 1. Interestingly, in 2013, researchers proposed a method for directing neural networks to become k-Lipschitz by penalizing the objective function with the operator norm of the weights of each layer. The preceding equation thus becomes taken from the paper Improved Training of Wassertein GANs [59]) the following:
The following screenshots are from scenes in the Improved ...