Computing the gradients of the loss function with respect to is the same as , since here also we take the sequential derivative of . Similar to , to compute the derivative of any loss with respect to , we need to traverse all the way back to .
The final equation for computing the gradient ...