Stochastic gradient descent

The method we've just seen of calculating gradient descent is often called batch gradient descent, because each update to the coefficients happens inside an iteration over all the data in a single batch. With very large amounts of data, each iteration can be time-consuming and waiting for convergence could take a very long time.

An alternative method of gradient descent is called stochastic gradient descent or SGD. In this method, the estimates of the coefficients are continually updated as the input data is processed. The update method for stochastic gradient descent looks like this:

Stochastic gradient descent

In fact, this is identical to batch ...

Get Clojure for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.