Chapter 5. Hyperparameter Optimization
Training a deep model and training a good deep model are very different things. While it’s easy enough to copy-paste some TensorFlow code from the internet to get a first prototype running, it’s much harder to transform that prototype into a high-quality model. The process of taking a prototype to a high-quality model involves many steps. We’ll explore one of these steps, hyperparameter optimization, in the rest of this chapter.
To first approximation, hyperparameter optimization is the process of tweaking all parameters of a model not learned by gradient descent. These quantities are called “hyperparameters.” Consider fully connected networks from the previous chapter. While the weights of fully connected networks can be learned from data, the other settings of the network can’t. These hyperparameters include the number of hidden layers, the number of neurons per hidden layer, the learning rate, and more. How can you systematically find good values for these quantities? Hyperparameter optimization methods provide our answer to this question.
Recall that we mentioned previously that model performance is tracked on a held-out “validation” set. Hyperparameter optimization methods systematically try multiple choices for hyperparameters on the validation set. The best-performing set of hyperparameter values is then evaluated on a second held-out “test” set to gauge the true model performance. Different hyperparameter optimization methods differ ...