Here, we will only discuss gradient-based optimization methods, which are most commonly used in GANs. Different gradient methods have their own strengths and weaknesses. There isn't a universal optimization method that can solve every problem. Therefore, we should choose them wisely when it comes to different practical problems. Let's have a look at some now:
- SGD (calling optim.SGD with momentum=0 and nesterov=False): It works fast and well for shallow networks. However, it can be very slow for deeper networks, and may not even converge for deep networks:
In this equation, is the parameters at iteration step ...