5

Building an Efficient Data Pipeline

Machine learning is grounded on data. Simply put, the training process feeds the neural network with a bunch of data, such as images, videos, sound, and text. Thus, apart from the training algorithm itself, data loading is an essential part of the entire model-building process.

It turns out that deep learning models deal with huge amounts of data, such as thousands of images and terabytes of text sequences. As a consequence, tasks related to data loading, preparation, and augmentation can severely delay the training process as a whole. So, to overcome a potential bottleneck in the model-building process, we must guarantee an uninterrupted flow of dataset samples to the training process.

In this chapter, ...

Get Accelerate Model Training with PyTorch 2.X now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.