Gradients with respect to U

Let's calculate the gradients of loss with respect to hidden-to-input layer weights for all the gates and the candidate state. Computing gradients of loss with respect to is exactly the same as the gradients we computed with respect to , except that the last term will be instead of . Let's examine what we mean by that. ...

Get Hands-On Deep Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.