Appendix II — Hardware Constraints for Transformer Models

Transformer models could not exist without optimized hardware. Memory and disk management design remain critical components. However, computing power remains a prerequisite. It would be nearly impossible to train the original Transformer described in Chapter 2, Getting Started with the Architecture of the Transformer Model, without GPUs. GPUs are at the center of the battle for efficient transformer models.

This appendix to Chapter 3, Fine-Tuning BERT Models, will take you through the importance of GPUs in three steps:

  • The architecture and scale of transformers
  • CPUs versus GPUs
  • Implementing GPUs in PyTorch as an example of how any other optimized language optimizes

The Architecture ...

Get Transformers for Natural Language Processing - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.