Hands-On Convolutional Neural Networks with TensorFlow
by Iffat Zafar, Giounona Tzanidou, Richard Burton, Nimesh Patel, Leonardo Araujo
Sharding
Although we said that it is best if we have all our data in one file, this is not actually 100% true. As TFRecords are read sequentially, we are unable to shuffle our dataset if we use just one file. Every time you reach the end of the TFRecord after an epoch of training, you will go back to the start of the dataset but, unfortunately, the data will be in the same order every time you go through the file.
In order to allow us to shuffle data, one thing we can do is shard our data by creating multiple TFRecord files and spreading out data across these multiple files. This way, we can just shuffle the order that we load the TFRecord files each epoch and thus our data will be effectively shuffled for us while we train. Something like ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access