The following are some of the ways in which the problem of exploding gradients can be solved:
- Certain gradient clipping techniques can be applied to solve this issue of exploding gradients.
- Another way to prevent this is by using truncated Backpropagation Through Time, where instead of starting the backpropagation at the last time step (or output layer), we can choose a smaller time step (say, 15) to start backpropagating. This means that the network will backpropagate through only the last 15 time steps at one instance and learn information related to those 15-time steps only. This is similar to feeding in mini batches of data to the network as it would become far too computationally expensive to compute the gradient over ...