December 2018
Beginner to intermediate
684 pages
21h 9m
English
The gradient of the loss function with respect to the hidden layer values computes as follows, where º refers to the element-wise matrix product:

We define a hidden_layer_gradient function to encode this result, as follows:
def hidden_layer_gradient(H, out_weights, loss_grad): """Error at the hidden layer. H * (1-H) * (E . Wo^T)""" return H * (1 - H) * (loss_grad @ out_weights.T)
The gradients for hidden layer weights and biases are as follows:

The corresponding functions are as follows:
def hidden_weight_gradient ...