Long short-term memory

As we saw earlier, the standard RNN does have some limitations; in particular, they suffer from the vanishing gradient problem. The LSTM architecture was proposed by Jürgen Schmidhuber (ftp://ftp.idsia.ch/pub/juergen/lstm.pdf) as a solution to the long-term dependency problem that RNNs face.

LSTM cells differ from vanilla RNN cells in a few ways. Firstly, they contain what we call a memory block, which is basically a set of recurrently connected subnets. Secondly, each of the memory blocks contains not only self-connected memory cells but also three multiplicative units that represent the input, output, and forget gates.

Let's take a look at what a single LSTM cell looks like, then we will dive into the nitty-gritty ...

Get Hands-On Mathematics for Deep Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.