In this chapter, we will create a neural network that will take an input of 10 video frames and output the probability over 101 action categories. We will create a neural network based on the conv3d operation in TensorFlow. This network is inspired on the work of D. Tran et al., Learning Spatiotemporal Features with 3D Convolutional Networks. However, we have simplified the model so it is easier to explain in a chapter. We have also used some techniques that are not mentioned by Tran et al., such as batch normalization and dropout.
Now, create a new Python file named nets.py and add the following code:
import tensorflow as tf from utils import print_variables, print_layers from tensorflow.contrib.layers.python.layers.layers ...