Another alternative to working with basic recurrent layers is to use a Long Short-Term Memory (LSTM) unit. This recurrent layer works with gates just like the GRU that we discussed in the previous section, except the LSTM has a lot more gates.
The following diagram outlines the structure of the LSTM layer:
The LSTM unit has a cell state that is central to how this layer type works. The cell state is kept over long periods of time and doesn't change much. The LSTM layer also has a hidden state, but this state serves a different role in the layer.
In short, the LSTM has a long-term memory modeled as ...