Gated Recurrent Units
An alternative to LSTM units are GRUs. These were first described by a team that was led by another significant figure in the history of deep learning, Yoshua Bengio. Their initial paper, Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation (2014), offers an interesting way of thinking about these ways of augmenting the effectiveness of our RNNs.
Specifically, they draw an equivalence between the Tanh activation function in a vanilla RNN and LSTM/GRU units, also describing them as activations. The difference in the nature of their activation is whether information is retained, unchanged, or updated in the units themselves. In effect, the use of the Tanh function means that your ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access