The general idea behind this method is that not all weights in the neural network are equally important. So, we can reduce the size of the network by throwing out unimportant weights. Technically, this can be done in the following way:
- Train a large network:
- Leverage any previously trained network, say, VGG16, and retrain only the fully connected layers
- Rank the filters, or create a sparse network based on a criteria:
- We could rank each filter by using any feasible criteria (say, Taylor Criteria), and pruning the lowest-ranking filters, or alternatively, replace all the values less than a certain threshold, with zeros resulting in a sparse network
- Fine tune and repeat:
- Perform several iterations of training on the ...