A sequence-to-sequence video-captioning system

The sequence-to-sequence architecture is based on a paper called sequence to sequence—Video to Text authored by Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. The paper can be located at https://arxiv.org/pdf/1505.00487.pdf.

In the following diagram (Figure 5.3), a sequence-to-sequence video-captioning neural network architecture based on the preceding paper is illustrated:

Figure 5.3: Sequence-to-sequence video-captioning network architecture

The sequence-to-sequence model processes the video image frames through a pre-trained convolutional ...

Get Intelligent Projects Using Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.