In gradient descent-based logistic regression models, all training samples are used to update the weights for each single iteration. Hence, if the number of training samples is large, the whole training process becomes very time-consuming and computation expensive, as we just witnessed in our last example.
Fortunately, a small tweak will make logistic regression suitable for large-size data. For each weight update, only one training sample is consumed, instead of the complete training set. The model moves a step based on the error calculated by a single training sample. Once all samples are used, one iteration finishes. This advanced version of gradient descent is called ...