Until now, we've seen that nodes act like functions, and they are chained together in a network. Another way to think about NNs is that they are nested functions—they really are. Through settling on the best weights, they can approximate almost any function and properly handle the given task; thus, this is why they are called universal function approximators.
Finding the best weights is essentially an optimization task. Nesting and the usual huge amount of parameters make this particular task a little bit challenging. Neural nets only became feasible after the first efficient algorithms came out. Backpropagation was the method that first stood up for the assignment.