April 2020
Intermediate to advanced
380 pages
9h 24m
English
As expected, we need a large collection of general-purpose images, along with possible captions listed for them. We have shown in the previous section, Understanding an image caption generator, that a single image can have multiple captions without any of them having to be wrong. Hence, in this project, we will be working on the Flickr8k dataset. Besides this, we will also require the GloVE embeddings created by Jeffrey Pennington, Richard Socher, and Christopher D. Manning. In short, GloVE tells us which words are likely to follow after any given word, helping us form meaningful sentences from a set of disjoint words.
Read now
Unlock full access