March 2018
Intermediate to advanced
272 pages
7h 53m
English
In several experiments conducted by Google researchers in the paper Revisiting Unreasonable Effectiveness of Data in Deep Learning Era, they constructed an internal dataset that contained 300 million observations, which is obviously much larger than ImageNet. They then trained several state-of-the-art architectures on this dataset, increasing the amount of data shown to the model from 10 million to 30 million, 100 million, and finally 300 million. In doing so, they showed that model performance increased linearly with the log of the number of observations used to train, showing us that more data always helps in the source domain.
But what about the target domain? We repeated the Google experiment using a few ...