December 2018
Beginner to intermediate
684 pages
21h 9m
English
The derivative of the cross-entropy loss function, J, with respect to each output layer activation, i = 1, ..., N, is a very simple expression (see notebook for details), on the left for scalar values and on the right in matrix notation, as follows:

We define the loss_gradient function accordingly, as follows:
def loss_gradient(y_hat, y_true): """output layer gradient""" return y_hat - y_true