August 2019
Intermediate to advanced
242 pages
5h 45m
English
As for our optimization method, we use Adam. You may recall from Chapter 2, What is a Neural Network and How Do I Train One?, that the Adam solver belongs to the class of solvers that use a dynamic learning rate. In vanilla SGD, we fix the learning rate. Here, the learning rate is set per parameter, giving us more control in cases where sparsity of data (vectors) is a problem. Additionally, we use the root MSE propagation versus the previous gradient, understanding the rate of change in the shape of our optimization surface and, by doing so, improving how our network handles noise in the data.
Now, let's talk about the layers of our neural network. Our first two layers are standard feedforward networks ...
Read now
Unlock full access