Chapter 8. Sequence Modeling with Keras
So far, we have looked at documents as bags-of-words. This is a common, easily achievable approach for many NLP tasks. However, this approach produces a low-fidelity model of language. The order of words is essential to the encoding and decoding of meaning in language, and to incorporate this, we will need to model sequences.
When people refer to sequences in a machine learning context, they are generally talking about sequences of data in which data points are not independent of the data points around them. We can still use features derived from the data point, as with general machine learning, but now we can also use the data and labels from the nearby data points. For example, if we are trying to determine if the token “produce” is being used as a noun or verb, knowing what words are around it will be very informative. If the token before it is “might,” that indicates “produce” is a verb. If “the” is the preceding token, that indicates it is a noun. These other words give us context.
What if “to” precedes “produce”? That could still indicate either a noun or a verb. We need to look back further. This gives us the concept of windows—the amount of context we want to capture. Many algorithms have a fixed amount of context that is decided as a hyperparameter. There are some algorithms, for example LSTMs, that can learn how long to remember the context.
Sequence problems come from different domains, and in different data formats. In this book ...