Chapter 5: Training at Scale

When we build and train more complex models or use large amounts of data in an ingestion pipeline, we naturally want to make better use of all the compute time and memory resources at our disposal in a more efficient way. This is the major purpose of this chapter, as we are going to integrate what we learned in previous chapters with techniques for distributed training running in a cluster of compute nodes.

TensorFlow has developed a high-level API for distributed training. Furthermore, this API integrates with the Keras API very well. As it turns out, the Keras API is now a first-class citizen in the TensorFlow ecosystem. Compared to the estimator API, Keras receives the most support when it comes to a distributed ...

Get Learn TensorFlow Enterprise now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.