Noisy networks are not those networks that need to know everything—those would be nosey networks. Instead, noisy networks introduce the concept of noise into the weights used to predict the output through the network. So, instead of having a single scalar value to denote the weight in a perceptron, we now think of weights as being pulled from some form of distribution. Obviously, we have a common theme going on here and that is going from working with numbers as single scalar values to what is better described as a distribution of data. If you have studied the subject of Bayesian or variational inference, you will likely understand this concept concretely.
For those without that background, let's look at what ...