Show and tell
One of the fundamental approaches for image captioning is proposed in Oriol Vinyals' paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge (https://arxiv.org/abs/1609.06647). This session provides a deeper dive into the show and tell algorithm. At a high-level, the caption model contains two stages: the encoder stage and the decoder stage. The encoder generates an image embedding as the initial input of the decoder, and the decoder transforms the image representation to the natural language description space.
A similar process can be seen in the area of machine translation, where the goal is to translate the content written in one language to another.
Mathematically, this single joint model takes ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access