Chapter 6. Recurrent Neural Networks

In this chapter, we’ll cover recurrent neural networks (RNNs), a class of neural network architectures meant for handling sequences of data. The neural networks we’ve seen so far treated each batch of data they received as a set of independent observations; there was no notion of some of the MNIST digits arriving before or after the other digits, in either the fully connected neural networks we saw in Chapter 4 or the convolutional neural networks we saw in Chapter 5. Many kinds of data, however, are intrinsically ordered, whether time series data, which one might deal with in an industrial or financial context, or language data, in which the characters, words, sentences, and so on are ordered. Recurrent neural networks are designed to learn how to take in sequences of such data and return a correct prediction as output, whether that correct prediction is of the price of a financial asset on the following day or of the next word in a sentence.

Dealing with ordered data will require three kinds of changes from the fully connected neural networks we saw in the first few chapters. First, it will involve “adding a new dimension” to the ndarrays we feed our neural networks. Previously, the data we fed our neural networks was intrinsically two-dimensional—each ndarray had one dimension representing the number of observations and another representing the number of features;1 another way to think of this is that each observation was a one-dimensional ...

Get Deep Learning from Scratch now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.