If there's a problem with the network being tied densely, just force it to be sparse. Then the vanishing gradient problem won't occur and learning can be done properly. The algorithm based on such an idea is the dropout algorithm. Dropout for deep neural networks was introduced in Improving neural networks by preventing co adaptation of feature detectors (Hinton, et. al. 2012, http://arxiv.org/pdf/1207.0580.pdf) and refined in Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Srivastava, et. al. 2014, https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf). In dropout, some of the units are, literally, forcibly dropped while training. What does this mean? Let's look at the following figures—firstly, neural networks: ...

Get Deep Learning: Practical Neural Networks with Java now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.