One really great way to reduce overfitting in deep neural networks is to employ a technique called dropout. Dropout does exactly what it says, it drop neurons out of a hidden layer. Here's how it works.
Through every minibatch, we will randomly choose to turn off nodes in each hidden layer. Imagine we had some hidden layer where we had implemented dropout, and we chose the drop probability to be 0.5. That means, for every mini batch, for every neuron, we flip a coin to see whether we use that neuron. In doing so, you'd probably randomly turn off about half of the neurons in that hidden layer:
If we do this ...