Training the model

In this section, we put all the pieces together to build the function for training the video-captioning model.

First, we create the word vocabulary dictionary, combining the video captions from the training and test datasets. Once this is done, we invoke the build_model function to create the video-captioning network, combining the two LSTMs. For each video with a specific start and end, there are multiple output video captions. Within each batch, the output video caption for a video with a specific start and end is randomly selected from the multiple video captions available. The input text captions to the LSTM 2 are adjusted to have the starting word at the time step (N+1) as <bos>, while the end word of the output text ...

Get Intelligent Projects Using Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.