Gradient Descent

Let’s look for a better train algorithm. The job of train is to find the parameters that minimize the loss, so let’s start by focusing on loss itself:

 def​ ​loss​(X, Y, w, b):
 return​ np.average((predict(X, w, b) - Y) ** 2)

Look at this function’s arguments. X and Y contain the input variables and the labels, so they never change from one call of loss to the next. To make the upcoming discussion easier, let’s also temporarily fix b at 0. So now the only variable is w.

How does the loss change as w changes? I put together a program that plots loss for w ranging from -1 to 4, and draws a green cross on its minimum value. Check out the following graph (as usual, that code is among the book’s source code):

Nice curve! Let’s ...

Get Programming Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.