A sequence-to-sequence video-captioning system

The sequence-to-sequence architecture is based on a paper called sequence to sequence—Video to Text authored by Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. The paper can be located at https://arxiv.org/pdf/1505.00487.pdf.

In the following diagram (Figure 5.3), a sequence-to-sequence video-captioning neural network architecture based on the preceding paper is illustrated:

Figure 5.3: Sequence-to-sequence video-captioning network architecture

The sequence-to-sequence model processes the video image frames through a pre-trained convolutional ...

Get Intelligent Projects Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.