August 2018
Intermediate to advanced
438 pages
12h 3m
English
This is the model architecture that ties the two previous components together. It was originally a great success with neural machine translation, where you typically input words from one language into the encoder and the decoder outputs words in another language. The advantage is that with a single end-to-end architecture, you can connect these two components and solve the problem instead of trying to build separate and disconnected models to solve one problem.
The DCNN model typically forms the encoder that encodes the source input image into a fixed-length dense vector, which is then decoded by the LSTM-based sequence model into a sequence of words, giving us our desired caption. Also, as discussed earlier, this model ...