January 2019
Intermediate to advanced
342 pages
9h 17m
English
In the encoding stage we process each video image frame features (from CNN last layer) sequentially by passing them through the time steps of LSTM 1. The dimension of the video image frame is 4096. Before feeding those high dimensional video frame feature vectors to the LSTM 1, they are downsized to a smaller size of 512.
LSTM 1 processes the video frame images and passes the hidden state to the LSTM 2 at each time step and this process continues till the time step N ( self.video_lstm_step) . The code for the encoder is as follows:
probs = [] loss = 0.0 # Encoding Stage for i in range(0, self.video_lstm_step): if i > 0: tf.get_variable_scope().reuse_variables() with tf.variable_scope("LSTM1"): output1, state1 = self.lstm1(image_emb[:,i,:], ...Read now
Unlock full access