In the previous chapters, we learned that the LSTM, or even the RNN, returns results from the last time step (hidden state values from the last time step are passed on to the next layer). Imagine a scenario where the output is five dimensions in size where the five dimensions are the five outputs (not softmax values for five classes). To further explain this idea, let's say we are predicting, not just the stock price on the next date, but the stock prices for the next five days. Or, we want to predict not just the next word, but a sequence of the next five words for a given combination of input sequence.
This situation calls for a different approach in building the network. In the following section, we will look into multiple ...