You may notice that we had employed L2 regularization in the MXNet solution, which adds penalties for large weights in order to avoid overfitting; but we did not do so in this Keras solution. This results in a slight difference in classification accuracy on the testing set (99.30% versus 98.65%). We are going to employ regularization in our Keras solution, specifically dropout this time.
Dropout is a regularization technique in neural networks initially proposed by Geoffrey Hinton et. al. in 2012 (Improving Neural Networks by Preventing Co-adaptation of Feature Detectors in Neural and Evolutionary Computing). As the name implies, it ignores a small subset of neurons (can be hidden or visible) that are randomly ...