February 2018
Intermediate to advanced
218 pages
5h 31m
English
In this section, we will understand an intuition about backpropagation. This is a way of computing gradients using the chain rule. Understanding this process and its subtleties is critical for you to be able to understand and effectively develop, design, and debug neural networks.
In general, given a function f(x), where x is a vector of inputs, we want to compute the gradient of f at x denoted by ∇(f(x)). This is because in the case of neural networks, the function f is basically a loss function (L) and the input x is the combination of weights and training data. The symbol ∇ is pronounced as nabla:
(xi, yi ) i = 1......N
Why do we take the gradient on weight parameters?
It is given that the training data ...