August 2019
Intermediate to advanced
242 pages
5h 45m
English
With RNNs, we have multiple copies of the same network, one for each timestep. Therefore, we need a way to backpropagate the error derivatives and calculate weight updates for each of the parameters in every timestep. The way we do this is simple. We're following the contours of a function so that we can try and optimize its shape. We have multiple copies of the trainable parameters, one at each timestep, and we want these copies to be consistent with each other so that when we calculate all the gradients for a given parameter, we take their average. We use this to update the parameter at t0 for each iteration of the learning process.
The goal is to calculate the error as that accumulates across timesteps, and ...
Read now
Unlock full access