Building Machine Learning Systems with Python - Third Edition
by Luis Pedro Coelho, Willi Richert, Matthieu Brucher
Training neural networks
We haven't talked about training neural networks that much. Basically, all optimizations are a gradient descent, the questions are what step length are we going to use, and should we take the previous gradients into account or not?
When computing one gradient, there is also the question of whether we do this for just one new sample or we do it for a multitude of samples at the same time (the batch). Basically, we almost never feed only one sample at a time (as the size of a batch varies, all the placeholders have a first dimension set to None indicating that it will be dynamic).
This also imposes the creation of a special layer, batch_normalization, that scales the gradient (up or down, so that the layers can be updated ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access