December 2018
Intermediate to advanced
158 pages
3h 58m
English
Using torch.distributed is probably the most common approach. This package provides communication primitives, such as classes, to check the number of nodes in a network, ensure the availability of backend communication protocols, and initialize process groups. It works on the module level. The torch.nn.parallel.DistributedDataParallel() class is a container that wraps a PyTorch model, allowing it to inherit the functionality of torch.distributed. The most common use case involves multiple processes that each operate on their own GPU, either locally or over a network. A process group is initialized to a device using the following code:
torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...')
Read now
Unlock full access