Residual networks (ResNets, https://arxiv.org/abs/1512.03385) were released in 2015, when they won all five categories of the ImageNet challenge that year. In Chapter 2, Neural Networks, we mentioned that the layers of a neural network are not restricted to sequential order, but form a graph instead. This is the first architecture we'll learn, which takes advantage of this flexibility. This is also the first network architecture that has successfully trained a network with the depth of more than 100 layers.
Thanks to better weight initializations, new activation functions, as well as normalization layers, it's now possible to train deep networks. But the authors of the paper conducted some experiments and observed that a ...