book

Deep Learning with Keras

by Antonio Gulli, Sujit Pal

April 2017

Intermediate to advanced

318 pages

7h 40m

English

Packt Publishing

Read now

Unlock full access

Content preview from Deep Learning with Keras

Vanishing and exploding gradients

Just like traditional neural networks, training the RNN also involves backpropagation. The difference in this case is that since the parameters are shared by all time steps, the gradient at each output depends not only on the current time step, but also on the previous ones. This process is called backpropagation through time (BPTT) (for more information refer to the article: Learning Internal Representations by Backpropagating errors, by G. E. Hinton, D. E. Rumelhart, and R. J. Williams, Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1, 1985):