December 2018
Beginner to intermediate
684 pages
21h 9m
English
In theory, RNNs can make use of information in arbitrarily long sequences. However, in practice, they are limited to looking back only a few steps.
More specifically, RNNs struggle to derive useful context information from time steps that are far from the current observation. The fundamental problem is the impact on gradients of propagation over many time steps. Due to repeated multiplication, the gradients tend to either vanish, that is, decrease toward zero (the typical case), or explode, that is, grow toward infinity (this happens less frequently, but renders optimization very difficult).
Even if the parameters are such that stability is not at risk and the network is able to store memories, ...