Generating image captions

RNNs can also work with problems that require fixed input to be transformed into a variable sequence. Image captioning takes in a fixed input picture, and outputs a completely variable description of that picture. These models utilize a CNN to input the image, and then feed the output of that CNN into an RNN, which will generate the caption one word at a time:

 We'll be building a neural captioning model based on the Flicker 30 dataset, provided by the University of California, Berkley, which you can find in the corresponding GitHub repository for this chapter. In this case, we'll be utilizing pretrained image embeddings ...

Get Hands-On Artificial Intelligence for Beginners now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.