We just learned how the seq2seq model works and how it translates a sentence from the source language to the target language. We learned that a context vector is basically a hidden state vector from the final time step of an encoder, which captures the meaning of the input sentence, and it is used by the decoder to generate the target sentence.
But when the input sentence is long, the context vector does not capture the meaning of the whole sentence, since it is just the hidden state from the final time step. So, instead of taking the last hidden state as a context vector and using it for the decoder, we take the sum of all the hidden states from the encoder and use it as a context vector.
Let's say the input sentence ...