Training a deep neural network is very challenging and time-consuming due to the non-convex objective function. Several challenges can significantly delay convergence, find a poor optimum, or cause oscillations or divergence from the target:
- Local minima can prevent convergence to a global optimum and cause poor performance.
- Flat regions with low gradients that are not a local minimum can also prevent convergence while most likely being distant from the global optimum.
- Steep regions with high gradients, which can result from multiplying several large weights, can cause excessive adjustments.
- Deep architectures or the modeling of long-term dependencies in RNNs (see the next chapter) can require the multiplication of many ...