Skip to Content
TensorFlow 2 Pocket Reference
book

TensorFlow 2 Pocket Reference

by KC Tung
July 2021
Intermediate to advanced content levelIntermediate to advanced
253 pages
5h 1m
English
O'Reilly Media, Inc.
Content preview from TensorFlow 2 Pocket Reference

Chapter 3. Data Preprocessing

In this chapter, you’ll learn how to prepare and set up data for training. Some of the most common data formats for ML work are tables, images, and text. There are commonly practiced techniques associated with each, though how you set up your data engineering pipeline will, of course, depend on what your problem statement is and what you are trying to predict.

I’ll look at all three formats in detail, using specific examples to walk you through the techniques. All the data can be read directly into your Python runtime memory; however, this isn’t the most efficient way to use your compute resources. When I discuss text data, I’ll give particular attention to tokenization and dictionaries. By the end of this chapter, you’ll have learned how to prepare table, image, and text data for training.

Preparing Tabular Data for Training

In a tabular dataset, it is important to identify which columns are considered categorical, because you have to encode their value as a class or a binary representation of the class (one-hot encoding), rather than a numerical value. Another aspect of tabular datasets is the potential for interactions among multiple features. This section will also look at the API that TensorFlow provides to make it easier to model column interactions.

It’s common to encounter tabular datasets as CSV files or simply as structured output from a database query. For this example, we’ll start with a dataset that’s already in a pandas DataFrame and ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

The TensorFlow Workshop

The TensorFlow Workshop

Matthew Moocarme, Abhranshu Bagchi, Anthony So, Anthony Maddalone

Publisher Resources

ISBN: 9781492089179Errata PageSupplemental Content