The decoder RNN is a 2-layer GRU with vertical residual connections (as explained previously):

def get_decoder_RNN_output(input_data): rnn1 = GRU(256, return_sequences=True)(input_data) inp2 = Add()([input_data, rnn1]) rnn2 = GRU(256)(inp2) decoder_rnn = Add()([inp2, rnn2]) return decoder_rnn

Note that we have to use return_sequences=True when we define the first GRU layer. That way, for each input timestep, an output will be returned, so that, given a sequence as input, a sequence is output by the first GRU. If we don't do so, the first GRU returns only one output for the entire input sequence, while the second GRU expects a sequence as input.