Appendix II — Hardware Constraints for Transformer Models

Transformer models could not exist without optimized hardware. Memory and disk management design remain critical components. However, computing power remains a prerequisite. It would be nearly impossible to train the original Transformer described in Chapter 2, Getting Started with the Architecture of the Transformer Model, without GPUs. GPUs are at the center of the battle for efficient transformer models.

This appendix to Chapter 3, Fine-Tuning BERT Models, will take you through the importance of GPUs in three steps:

The architecture and scale of transformers
CPUs versus GPUs
Implementing GPUs in PyTorch as an example of how any other optimized language optimizes

The Architecture ...

Get Transformers for Natural Language Processing - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Transformers for Natural Language Processing - Second Edition by Denis Rothman

Appendix II — Hardware Constraints for Transformer Models

The Architecture ...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly