December 2018
Beginner to intermediate
684 pages
21h 9m
English
As suggested in the previous section, RNNs are called recurrent because they apply the same transformations to every element of a sequence in such a way that the output depends on the outcome of prior iterations. As a result, RNNs maintain an internal state that captures information about previous elements in the sequence, just like memory.
The following diagram shows the computational graph implied by a simple hidden RNN unit learning two weight matrices during its training:
A non-linear transformation of the sum of the two matrix multiplications—for example, using the tanh or ReLU activation functions—becomes ...