In this appendix, we’ll look at the mathematics that underlie neural network training: the backpropagation algorithm. But before we dive in, let’s take a step back, and ask the question: What are we trying to do when training a neural network?
At its core (issues such as overfitting aside), we want the training process to adjust the parameters of the neural network (based on the training data) such that the network will generate accurate predictions. The two components to this are accurate predictions and adjust the parameters.
Let’s for a moment consider the task of classification, for which the neural network is asked to predict a class of the example, given some input data. There are many different ways to quantify how good predictions are for classification: accuracy, F1 score, negative log likelihood, and so on. All of these are valid measures, but some of them are much harder to optimize than others. For example, the accuracy of our network might not change if we change any given parameter in our network by a small amount; thus, directly optimizing accuracy using gradient-based methods is not possible (i.e., “accuracy" is not differentiable).
Conversely, other measures, such as negative log likelihood, will increase or decrease as we change the parameter values, even by a small amount. If we restrict ourselves to the class differentiable loss functions (such as negative log ...