December 2018
Beginner to intermediate
684 pages
21h 9m
English
The cost functions for neural networks do not differ significantly from those for other models. The choice can, however, impact the ability to train the model because it interacts with the nature of the gradient of the output.
Historically, the mean squared error (MSE) was a common choice, but slowed down training with binary sigmoid or multi-class softmax outputs. The gradients for these output functions can be very low (for example, for the flat region of the sigmoid function, called saturation) so that backpropagation can take a long time to achieve significant parameter updates, which in turn slows down training. The use of the cross-entropy family of loss functions greatly improved the performance of these models by reducing ...