In this section, we will explore several techniques that help us to train the neural network quickly. We will look at techniques such as preprocessing the data to have a similar scale, to randomly initializing the weights to avoid exploding or vanishing gradients, and more effective activation functions besides the sigmoid function.
We begin with the normalization of the data and then we'll gain some intuition on how it works. Suppose we have two features, X1 and X2, taking a different range of values—X1 from 2 to 5, and X2 from 1 to 2—which is depicted in the following diagram:
We will begin by calculating the ...