Computing partial derivatives is a process that's repeated thousands upon thousands of times while training a neural network and for this reason, this process must be as efficient as possible.
In the previous sections, we showed you how, by using a loss function, is it possible to create a bond between the model's output, the input, and the label. If we represent the whole neural network architecture using a graph, it's easy to see how, given an input instance, we are just performing a mathematical operation (input multiplied by a parameter, adding those multiplication results, and applying the non-linearity function to the sum) in an ordinate manner. At the input of this graph, we have the input ...