Now, we will see how to calculate the gradients of loss with respect to hidden-to-hidden layer weights, , for all the gates and the content state.
Let's calculate the gradients of loss with respect to .
Recall the equation of the reset gate, which is given as follows:
Using the chain rule, we can write the following:
Let's calculate each of the terms in the preceding equation. The first term,, we already calculated ...