Skip to Content
Scaling Machine Learning with Spark
book

Scaling Machine Learning with Spark

by Adi Polak
March 2023
Intermediate to advanced
291 pages
8h 54m
English
O'Reilly Media, Inc.
Content preview from Scaling Machine Learning with Spark

Chapter 8. TensorFlow Distributed Machine Learning Approach

TensorFlow (TF) is an open source software library developed by the Google Brain team to further advance deep learning in the industry. Their goal was, and still is, to close the gap between research and practice.

When TF was released in 2015, it blew the data science crowd away. Today, it’s one of the most used libraries for deep learning. To provide a holistic solution allowing for a full production pipeline, the TF team released TensorFlow Extended (TFX) to the public in 2019. On top of that, Google created its own processing units, called tensor processing units (TPUs), to accelerate machine learning workloads that are developed with TF. If the acronym looks familiar, that’s because it’s intentionally similar to GPU, which stands for graphics processing unit. While TPUs provide some advanced capabilities, using them largely ties the technological stack to Google technologies. GPUs are more agnostic and flexible, so using them as accelerators will make your application hardware plan more cross-platform.

TF provides various distributed training strategies for GPUs, CPUs, and TPUs. Using TF, you can enrich your machine learning capabilities beyond what Apache Spark provides out of the box. To connect the machine learning workflow of preprocessing the data with Spark and training a TF model, you can use MLflow (discussed in Chapter 3).

In the previous chapter, we discussed how to bridge Spark and TensorFlow by using Petastorm ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Deep Learning with PyTorch

Deep Learning with PyTorch

Eli Stevens, Luca Pietro Giovanni Antiga, Thomas Viehmann
Machine Learning for High-Risk Applications

Machine Learning for High-Risk Applications

Patrick Hall, James Curtis, Parul Pandey

Publisher Resources

ISBN: 9781098106812Errata Page