book

Deep Learning with PyTorch

by Vishnu Subramanian

February 2018

Intermediate to advanced

262 pages

6h 59m

English

Packt Publishing

Read now

Unlock full access

Content preview from Deep Learning with PyTorch

Data representativeness

In the example we saw in our last chapter, we classified images as either dogs or cats. Let's take a scenario where all the images are sorted and the first 60% of images are dogs and the rest are cats. If we split this dataset by choosing the first 80% as the training dataset and the rest as the validation set, then the validation dataset will not be a true representation of the dataset, as it will only contain cat images. So, in these cases, care should be taken that we have a good mix by shuffling the data before splitting or doing a stratified sampling. Stratified sampling refers to picking up data points from each category to create validation and test datasets.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781788624336Supplemental Content

Deep Learning with PyTorch

by Vishnu Subramanian

Data representativeness

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like