O'Reilly logo

Building Machine Learning Pipelines by Catherine Nelson, Hannes Hapke

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Data Validation with TensorFlow

In the last two chapters, we introduced how we can orchestrate machine learning workflows and manage the workflow’s metadata. With this in mind, we can start building our workflows.

Data is the basis for every machine learning model, and the model’s usefulness and performance depend on the data used to train, validate and analyze the model. As you can image, without robust data, we can’t build robust models. Moreover, in colloquial terms, you might have heard the phrase: “Garbage in, garbage out” - meaning that our models won’t perform if the underlying data isn’t curated and validated. This is the exact purpose of our first workflow step in our machine learning pipeline: data validation.

In this chapter, we introduce you to a Python package from the TensorFlow ecosystem called TensorFlow Data Validation. We show you how you can set up the package in your data science projects, walk you through the common use cases and highlight some very useful workflows.

TensorFlow Data Validation assists you in comparing multiple data sets with each other, and it highlights if your data schema changes over time (data drift) or if your training data is significantly different from your data to validate your models or data which is used to infer your model (data skew).

At the end of the chapter, we integrate our first workflow step into our Airflow pipelines.

Why Data Validation?

In machine learning, we are trying to learn from patterns in data sets ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required