Chapter 4. Model Training Patterns
Machine learning models are usually trained iteratively, and this iterative process is informally called the training loop. In this chapter, we discuss what the typical training loop looks like, and catalog a number of situations in which you might want to do something different.
Typical Training Loop
Machine learning models can be trained using different types of optimization. Decision trees are often built node by node based on an information gain measure. In genetic algorithms, the model parameters are represented as genes, and the optimization method involves techniques that are based on evolutionary theory. However, the most common approach to determining the parameters of machine learning models is gradient descent.
Stochastic Gradient Descent
On large datasets, gradient descent is applied to mini-batches of the input data to train everything from linear models and boosted trees to deep neural networks (DNNs) and support vector machines (SVMs). This is called stochastic gradient descent (SGD), and extensions of SGD (such as Adam and Adagrad) are the de facto optimizers used in modern-day machine learning frameworks.
Because SGD requires training to take place iteratively on small batches of the training dataset, training a machine learning model happens in a loop. SGD finds a minimum, but is not a closed-form solution, and so we have to detect whether the model convergence has happened. Because of this, the error (called the loss) on ...