Hochreiter and Schmidhuber in 1997 proposed a modified RNN model, called the long short-term memory (LSTM) as a solution to overcome the vanishing gradient problem. The hidden layer in the RNNs is replaced by an LSTM cell. The LSTM cell consists of three gates: forget gate, input gate, and the output gate. These gates control the amount of long-term memory and the short-term memory generated and retained by the cell. The gates all have the sigmoid function, which squashes the input between 0 and 1. Following, we see how the outputs from various gates are calculated, in case the expressions seem daunting to you, do not worry, we will be using the TensorFlow tf.contrib.rnn.BasicLSTMCell and tf.contrib.rnn.static_rnn
Long short-term memory
Get Hands-On Artificial Intelligence for IoT now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.