A very important element is the initial configuration of a neural network. How should the weights be initialized? Let's imagine we that have set them all to zero. As all neurons in a layer receive the same input, if the weights are 0 (or any other common, constant number), the output will be equal. When applying the gradient correction, all neurons will be treated in the same way; so, the network is equivalent to a sequence of single neuron layers. It's clear that the initial weights must be different to achieve a goal called symmetry breaking, but which is the best choice?
If we knew (also approximately) the final configuration, we could set them to easily reach the optimal point in a few iterations, but, unfortunately, ...