7 The Rise of Suprahuman Transformers with GPT-3 Engines

In 2020, Brown et al. (2020) described the training of an OpenAI GPT-3 model containing 175 billion parameters that learned using huge datasets such as the 400 billion byte-pair-encoded tokens extracted from Common Crawl data. OpenAI ran the training on a Microsoft Azure supercomputer with 285,00 CPUs and 10,000 GPUs.

The machine intelligence of OpenAI’s GPT-3 engines and their supercomputer led Brown et al. (2020) to zero-shot experiments. The idea was to use a trained model for downstream tasks without further training the parameters. The goal would be for a trained model to go directly into multi-task production with an API that could even perform tasks it wasn’t trained for.

The era ...

Get Transformers for Natural Language Processing - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Transformers for Natural Language Processing - Second Edition by Denis Rothman

7

The Rise of Suprahuman Transformers with GPT-3 Engines

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly