RNNs start losing historical context over time in the sequence, and hence are hard to train for practical purposes. This is where LSTMs come into the picture! Introduced by Hochreiter and Schmidhuber in 1997, LSTMs can remember information from really long sequence-based data and prevent issues such as the vanishing gradient problem. LSTMs usually consist of three or four gates, including input, output, and forget gates.
The following diagram shows a high-level representation of a single LSTM cell:
The input gate can usually allow or deny incoming signals or inputs to alter the memory cell state. The output gate usually propagates the ...