Now we will see how to calculate the gradients of loss with respect to hidden-to-hidden layer weights, , for all the gates and the candidate state.
Let's calculate gradients of loss with respect to .
Recall the equation of the input gate, which is given as follows:
Thus, by the chain rule, we can write the following:
Let's calculate each of the terms in the preceding equation.
We have already seen how to compute the ...