December 2017
Intermediate to advanced
536 pages
14h 23m
English
Training an RNN is hard because of two stability problems. Due to the feedback loop, the gradient can quickly diverge to infinity, or it can rapidly to 0. In both cases, as illustrated in the following figure, the network will stop learning anything useful. The problem of an exploding gradient can be tackled with a relatively simple solution based on gradient clipping. The problem of a vanishing gradient is more difficult to solve and it involves the definition of more complex RNN basic cells, such as Long Short Term Memory (LSTM), or Gated Recurrent Units (GRUs). Let's first discuss exploding gradients and gradient clipping:
Gradient clipping consists of imposing a maximum value to ...
Read now
Unlock full access