After discussing the basics of logistic regression, it's useful to introduce the SGDClassifier class, which implements a very common algorithm that can be applied to several different loss functions. The idea behind SGD is to minimize a cost function by iterating a weight update based on the gradient:
However, instead of considering the whole dataset, the update procedure is applied on batches randomly extracted from it (for this reason, it is often also called mini-batch gradient descent). In the preceding formula, L is the cost function we want to minimize with respect to the parameters (as discussed ...