Skip to Content
Distributed Machine Learning with Python
book

Distributed Machine Learning with Python

by Guanhua Wang
April 2022
Intermediate to advanced
284 pages
5h 53m
English
Packt Publishing
Content preview from Distributed Machine Learning with Python

Chapter 3: Building a Data Parallel Training and Serving Pipeline

In the previous chapter, we discussed the two main-stream data parallel training paradigms, parameter server and All-Reduce. Due to the shortcomings of the parameter server paradigm, the mainstream solution for data parallel training is the All-Reduce architecture. We will illustrate our implementation using the All-Reduce paradigm.

In this chapter, we will mainly focus on the coding side of data parallelism. Before we dive into the details, we will list the assumptions we have for the implementations in this chapter:

  • We will use homogenous hardware for all our training nodes.
  • All our training nodes will be exclusively used for a single job, which means no resource sharing ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Interpretable Machine Learning with Python

Interpretable Machine Learning with Python

Serg Masís
Distributed Computing with Python

Distributed Computing with Python

Francesco Pierfederici

Publisher Resources

ISBN: 9781801815697Supplemental Content