Now we start creating the network by combining convolutional, max pooling, dense (feedforward), and recurrent (LSTM) layers to classify each frame of a video clip. First, we need to define some hyperparameters and the necessary instantiation, as shown here:
private static MultiLayerConfiguration conf;private static MultiLayerNetwork net; private static String modelPath = "bin/ConvLSTM_Model.zip";private static int NUM_CLASSES;private static int nTrainEpochs = 100;
Here, NUM_CLASSES is the number of classes from UCF101 calculated as the quantity of directories in the dataset base directory:
NUM_CLASSES = reader.labelMap().size();
Then we start the training by calling the networkTrainer() method. Well, as I stated ...