Chapter 5: Splitting the Model
In this chapter, we will discuss how to train giant models with model parallelism. Giant models refers to models that are too large to fit into a single GPU's memory. Some examples of giant models include Bidirectional Encoder Representations from Transformers (BERT), Generative Pre-Trainer Transformer (GPT): GPT-2 and GPT-3.
In contrast to data parallel workloads, model parallelism is often adopted for language models. Language models are a specific type of deep learning model that works in the Natural Language Processing (NLP) domain. Here, the input data is usually text sequences. The model outputs predictions for tasks such as question answering and next sentence prediction.
NLP model training is often segregated ...
Get Distributed Machine Learning with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.