In this section, we will understand an intuition about backpropagation. This is a way of computing gradients using the chain rule. Understanding this process and its subtleties is critical for you to be able to understand and effectively develop, design, and debug neural networks.
In general, given a function f(x), where x is a vector of inputs, we want to compute the gradient of f at x denoted by ∇(f(x)). This is because in the case of neural networks, the function f is basically a loss function (L) and the input x is the combination of weights and training data. The symbol ∇ is pronounced as nabla:
(xi, yi ) i = 1......N
Why do we take the gradient on weight parameters?
It is given that the training data ...