Chapter 8. TensorFlow Distributed Machine Learning Approach

TensorFlow (TF) is an open source software library developed by the Google Brain team to further advance deep learning in the industry. Their goal was, and still is, to close the gap between research and practice.

When TF was released in 2015, it blew the data science crowd away. Today, it’s one of the most used libraries for deep learning. To provide a holistic solution allowing for a full production pipeline, the TF team released TensorFlow Extended (TFX) to the public in 2019. On top of that, Google created its own processing units, called tensor processing units (TPUs), to accelerate machine learning workloads that are developed with TF. If the acronym looks familiar, that’s because it’s intentionally similar to GPU, which stands for graphics processing unit. While TPUs provide some advanced capabilities, using them largely ties the technological stack to Google technologies. GPUs are more agnostic and flexible, so using them as accelerators will make your application hardware plan more cross-platform.

TF provides various distributed training strategies for GPUs, CPUs, and TPUs. Using TF, you can enrich your machine learning capabilities beyond what Apache Spark provides out of the box. To connect the machine learning workflow of preprocessing the data with Spark and training a TF model, you can use MLflow (discussed in Chapter 3).

In the previous chapter, we discussed how to bridge Spark and TensorFlow by using Petastorm ...

Get Scaling Machine Learning with Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.