The impact of source/target volume and similarity
Until somewhat recently, there has been very little investigation into the impact that data volume and source/target domain similarity have played in transfer learning performance; however, it's a topic important to the usability of transfer learning and a topic I've written about. In the paper Investigating the Impact of Data Volume and Domain Similarity on Transfer Learning Applications, (https://arxiv.org/pdf/1712.04008.pdf), written by my colleagues, Yuntao Li, Dingchao Zhang, and myself, we did some experimentation on these topics. Here's what we found.
More data is always beneficial
In several experiments conducted by Google researchers in the paper Revisiting Unreasonable Effectiveness of Data in Deep Learning Era, they constructed an internal dataset that contained 300 million observations, which is obviously much larger than ImageNet
. They then trained several state-of-the-art architectures on this dataset, increasing the amount of...