April 2017
Intermediate to advanced
318 pages
7h 40m
English
Just like traditional neural networks, training the RNN also involves backpropagation. The difference in this case is that since the parameters are shared by all time steps, the gradient at each output depends not only on the current time step, but also on the previous ones. This process is called backpropagation through time (BPTT) (for more information refer to the article: Learning Internal Representations by Backpropagating errors, by G. E. Hinton, D. E. Rumelhart, and R. J. Williams, Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1, 1985):

Consider the small three layer ...