For this problem, we have only 25, 000 training examples. For a deep learning model, this amount of data is usually not enough to capture all the details. No matter how sophisticated our network is and how much time we spent tuning it, at some point 25, 000 examples will not be enough to improve the performance further.
Often, getting more data is very expensive or not possible at all. But what we can do is generating more data from the data we already have, and this is called data augmentation. Usually, we generate new data by doing some of the following transformations:
- Rotating the image
- Flipping the image
- Randomly cropping the image
- Switching the color channels (for example, changing the red and blue channels)
- Changing ...