In the previous section, we learned how to normalize the data and initialize the weights, along with choosing a good activation function that can dramatically speed up neural network learning time.
In this section, we'll take a step further by optimizing our algorithm and the way we update the weights in a backward pass.
To do this, we need to revisit how a neural network learns. We begin with training data of size m, and each of the examples depicted in this section has n features, and for each of the examples, we also have its corresponding prediction value:
What we want is a neural network to learn from these examples ...