The Microsoft common objects in context
Advances in application of deep learning in computer vision are often highly focalized on the kind of classification problems that can be summarized by challenges such as ImageNet (but also, for instance, PASCAL VOC - http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) and the ConvNets suitable to crack it (Xception, VGG16, VGG19, ResNet50, InceptionV3, and MobileNet, just to quote the ones available in the well-known package Keras
: https://keras.io/applications/).
Though deep learning networks based on ImageNet data are the actual state of the art, such networks can experience difficulties when faced with real-world applications. In fact, in practical applications, we have to process images that are quite different from the examples provided by ImageNet. In ImageNet the elements to be classified are clearly the only clear element present in the image, ideally set in an unobstructed way near the center of a neatly composed photo. In the reality of images...