ResNet solves these problems by explicitly letting the layers in the network fit a residual mapping by adding a shortcut connection. The following image shows how ResNet works:
In all the networks we have seen, we try to find a function that maps the input (x) to its output (H(x)) by stacking different layers. But the authors of ResNet proposed a fix; instead of trying to learn an underlying mapping from x to H(x), we learn the difference between the two, or the residual. Then, to calculate H(x), we can just add the residual to the input. Say the residual is F(x) = H(x) - x; instead of trying to learn H(x) directly, we try to learn ...