Table of Contents
Preface
Section 1 – Data Parallelism
Chapter 1: Splitting Input Data
Single-node training is too slow
The mismatch between data loading bandwidth and model training bandwidth
Single-node training time on popular datasets
Accelerating the training process with data parallelism
Data parallelism – the high-level bits
Stochastic gradient descent
Model synchronization
Hyperparameter tuning
Global batch size
Learning rate adjustment
Model synchronization schemes
Summary
Chapter 2: Parameter Server and All-Reduce
Technical requirements
Parameter server architecture
Communication bottleneck in the parameter server architecture
Sharding the model among parameter servers
Implementing the parameter server
Defining model layers
Defining ...
Get Distributed Machine Learning with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.