January 2018
Beginner to intermediate
284 pages
8h 35m
English
In general, image captioning approaches can be categorized into two categories—generating the description from scratch, searching, and retrieving from a visual space and or a multimodal space:
The show and tell model can be considered as generating descriptions from scratch (using an RNN).
For the category of the retrieval types of approaches, there are also two types.
The first type is to treat the problem as a retrieval problem from a joint multimodal space. The training set contains image-description pairs or image patches—sentence segment pairs and a joint model is trained using both visual and textual ...
Read now
Unlock full access