December 2018
Beginner to intermediate
684 pages
21h 9m
English
One way to reduce the computational complexity of hidden state recurrences is to connect a unit's hidden state to the prior unit's output rather than to its hidden state. The resulting RNN has a lower capacity than the architecture discussed previously, but different time steps are now decoupled and can be trained in parallel.
However, the successful learning of relevant past information requires the training output samples to capture this information so that backpropagation can adjust the network parameters accordingly. The use of previous outcome values alongside the input vectors is called teacher forcing.
Connections from the output to the subsequent hidden states can be also used in combination with ...