Understanding gradient descent

When we talked about the perceptron earlier in this chapter, we identified three essential ingredients needed for training: training data, a cost function, and a learning rule. Whereas the learning rule worked great for a single perceptron, unfortunately, it did not generalize to MLPs, so people had to come up with a more general rule.

If you think about how we measure the success of a classifier, we usually do so with the help of a cost function. A typical example is the number of misclassifications of the network or the mean squared error. This function (also known as a loss function) usually depends on the parameters we are trying to tweak. In neural networks, these parameters are the weight coefficients. ...

Get Machine Learning for OpenCV now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.