The LSTM version of the RNN is used to build the sequence-to-sequence model. This is because LSTMs are much more efficient in remembering long-term dependencies in long sequences of text. The three gates in the LSTM architecture enable it to remember long-term sequences efficiently. A basic RNN is unable to remember long-term dependencies because of the vanishing gradient problems that are associated with its architecture.
In this model, we are using two LSTMs. The first LSTM would encode the input tweet into a context vector. This context vector is nothing but the last hidden state of the encoder LSTM, n being the dimension ...