Dropout
Dropout works by probabilistically removing a neuron from designated layers during training, or by dropping a certain connection. At training time, neurons are sampled at random from a Bernoulli distribution with p = 0.5 (note that at testing time all neurons are used, but the value of weights are halved). This helps to reduce co-adaptations (a feature cannot only be useful in the presence of particular other features) between neurons. Each neuron becomes more robust, and improves the training speed significantly. The following figure illustrates the network structure with Dropout at two epochs, from which we can see that essentially, with Dropout, we have formed distinct network architectures at each epoch and, jointly, this process ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access