Gradients with respect to hidden to hidden layer weights, W

Now, we will compute the gradients of loss with respect to hidden to hidden layer weights, . Similar to , the final gradient is the sum of the gradients at all time steps:

So, we can write:

First, let's compute gradient of loss, with respect to , that is, .

We cannot compute derivative ...

Get Hands-On Deep Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.