April 2022
Intermediate to advanced
284 pages
5h 53m
English
So far, we have discussed all the mainstream distributed Deep Neural Network (DNN) model training and inference methodologies. Here, we want to illustrate some advanced techniques that can be used along with all the previous techniques we have.
In this chapter, we will mainly cover advanced techniques that can be applied generally to DNN training and serving. More specifically, we will discuss general performance debugging approaches, such as kernel event monitoring, job multiplexing, and heterogeneous model training.
Before we discuss anything further, we will list the assumptions we have for this chapter, as follows: