April 2022
Intermediate to advanced
284 pages
5h 53m
English
In the previous chapter, we discussed the two main-stream data parallel training paradigms, parameter server and All-Reduce. Due to the shortcomings of the parameter server paradigm, the mainstream solution for data parallel training is the All-Reduce architecture. We will illustrate our implementation using the All-Reduce paradigm.
In this chapter, we will mainly focus on the coding side of data parallelism. Before we dive into the details, we will list the assumptions we have for the implementations in this chapter: