July 2019
Intermediate to advanced
512 pages
19h 39m
English
Now, we will look at a small variant of the Adam algorithm called Adamax. Let's recall the equation of the second-order moment in Adam:

As you may have noticed from the preceding equation, we scale the gradients inversely proportional to the
norm of the current and past gradients (
norm basically means the square of values):
Instead of having just , can we generalize it to the norm? In general, when we have ...
Read now
Unlock full access