Chapter 22. Generative AI

At the time of this writing, it has been about a year and a half since the launch of ChatGPT shook the world. Since that time, generative AI (GenAI) has advanced at a rapid pace, with frequent releases of increasingly capable models. Serious people are now talking seriously about the development of artificial general intelligence (AGI), which is seen as near humanlike or beyond.

Of course, the recent wave of GenAI is the result of years of work in ML and computational neuroscience. A breakthrough moment was the release of the Transformer architecture in 2017, with the paper “Attention Is All You Need”. ChatGPT, Gemini, LLaMa, and the other recent advances have mostly been built on the Transformer architecture, but recently other architectures have been developed, including selective State-Space Models, starting with Mamba.

We expect the field to continue to grow, with continued advances, and so any discussion about GenAI in a book such as this one is somewhat doomed to rapid obsolescence. We’ve tried to shape this chapter to give you a broad understanding of the current state of the art so that you can better understand and keep pace with new advances. Therefore, this chapter goes through the main areas of GenAI development, including both model training and production considerations. It starts with a discussion of model types, followed by pretraining and model adaptation (fine-tuning). We then examine some of the current techniques for shaping pretrained ...

Get Machine Learning Production Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.