Skip to Content
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
book

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition

by Aurélien Géron
October 2022
Intermediate to advanced content levelIntermediate to advanced
864 pages
25h 31m
English
O'Reilly Media, Inc.
Book available
Content preview from Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition

Chapter 13. Loading and Preprocessing Data with TensorFlow

In Chapter 2, you saw that loading and preprocessing data is an important part of any machine learning project. You used Pandas to load and explore the (modified) California housing dataset—which was stored in a CSV file—and you applied Scikit-Learn’s transformers for preprocessing. These tools are quite convenient, and you will probably be using them often, especially when exploring and experimenting with data.

However, when training TensorFlow models on large datasets, you may prefer to use TensorFlow’s own data loading and preprocessing API, called tf.data. It is capable of loading and preprocessing data extremely efficiently, reading from multiple files in parallel using multithreading and queuing, shuffling and batching samples, and more. Plus, it can do all of this on the fly—it loads and preprocesses the next batch of data across multiple CPU cores, while your GPUs or TPUs are busy training the current batch of data.

The tf.data API lets you handle datasets that don’t fit in memory, and it allows you to make full use of your hardware resources, thereby speeding up training. Off the shelf, the tf.data API can read from text files (such as CSV files), binary files with fixed-size records, and binary files that use TensorFlow’s TFRecord format, which supports records of varying sizes.

TFRecord is a flexible and efficient binary format usually containing protocol buffers (an open source binary format). The tf.data API ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition

Aurélien Géron
Machine Learning with PyTorch and Scikit-Learn

Machine Learning with PyTorch and Scikit-Learn

Sebastian Raschka, Yuxi (Hayden) Liu, Vahid Mirjalili

Publisher Resources

ISBN: 9781098125967Errata PageSupplemental Content