January 2018
Beginner to intermediate
284 pages
8h 35m
English
In very deep networks, such as recurrent networks (and possibly recursive ones), the gradient can become very small or very large quickly, and the locality assumption of gradient descent breaks down. The solution, first introduced by Mikolov, is to clip gradients to a maximum value, which makes a big difference in RNNs.
Read now
Unlock full access