Residual Networks
In previous sections it was shown that the depth of a network is a crucial factor that contributes in accuracy improvement (see VGG). It was also shown in Chapter 3, Image Classification in TensorFlow, that the problem of vanishing or exploding gradients in deep networks can be alleviated by correct weight initialization and batch normalization. Does this mean however, that the more layers we add the more accurate the system we get is? The authors in Deep Residual Learning for Image Recognition form Microsoft research Asia have found that accuracy gets saturated as soon as the network gets 30 layers deep. To solve this problem they introduced a new block of layers called the residual block, which adds the output of the previous layer to the output of the next layer (refer to the figure below). The Residual Net or ResNet has shown excellent results with very deep networks (greater than even 100 layers!), for example the 152-layer ResNet which won the 2015 LRVC image recognition...