Backward propagation

We minimize the loss function using the gradient descent algorithm. So, we backpropagate the network, calculate the gradient of the loss function with respect to weights, and update the weights according to the weight update rule.

First, we compute the gradient of loss with respect to hidden to output layer . We cannot calculate the derivative of loss with respect to directly from as it has no term in it, so we apply the ...

Get Hands-On Deep Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.