O'Reilly logo

Deep Learning with TensorFlow - Second Edition by Md. Rezaul Karim, Giancarlo Zaccone

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

RNN and the gradient vanishing-exploding problem

Gradients for deeper layers are calculated as products of many gradients of activation functions in the multi-layer network. When those gradients are small or zero, it will easily vanish. On the other hand, when they are bigger than 1, it will possibly explode. So, it becomes very hard to calculate and update.

Let's explain them in more detail:

  • If the weights are small, it can lead to a situation called vanishing gradients, where the gradient signal gets so small that learning either becomes very slow or stops working altogether. This is often referred to as vanishing gradients.
  • If the weights in this matrix are large, it can lead to a situation where the gradient signal is so large that it can cause ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required