January 2018
Intermediate to advanced
310 pages
7h 48m
English
Chen et al., in the paper https://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf, proposed a method to retrieve images from text and text from images. This is a bi-directional mapping. The following image shows a person explaining an image in natural language and another person visually thinking about it:

Retrieving captions can be achieved by connecting encoders of image and text through a latent space as shown here:

Read now
Unlock full access