Vanishing gradient and LTSM
Similar to all deep architectures, the deeper the networks get, the more severe the vanishing gradient problem gets. What’s happening is that the weights at the beginning of the network change less and less. Given that the network's weights are generated randomly, with non-moving weights, we are learning very little from the data. This so-called vanishing gradient problem also affects RNN.
Each of the time steps in RNN can be thought of as a layer. Then, during backpropagation, errors are going from one time step to the previous one. So the network can be thought of as being as deep as the number of time steps. In many practical problems, such as word sentences, paragraphs, or other time-series data, the sequences ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access