Understanding gradient descent

When we talked about the perceptron earlier in this chapter, we identified three of the essential ingredients needed for training: training data, a cost function, and a learning rule. While the learning rule worked great for a single perceptron, unfortunately, it did not generalize to MLPs, so people had to come up with a more general rule.

If you think about how we measure the success of a classifier, we usually do so with the help of a cost function. A typical example is the number of misclassifications of the network or the mean squared error. This function (also known as a loss function) usually depends on the parameters we are trying to tweak. In neural networks, these parameters are the weight coefficients. ...

Get Machine Learning for OpenCV 4 - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.