After discussing the basics of logistic regression, it's useful to introduce the SGDClassifier class , which implements a very famous algorithm that can be applied to several different loss functions. The idea behind stochastic gradient descent is iterating a weight update based on the gradient of loss function:
However, instead of considering the whole dataset, the update procedure is applied on batches randomly extracted from it. In the preceding formula, L is the loss function we want to minimize (as discussed in Chapter 2, Important Elements in Machine Learning) and gamma (eta0 in scikit-learn) is ...