1 Understanding large language models

This chapter covers

  • High-level explanations of the fundamental concepts behind large language models (LLMs)
  • Insights into the transformer architecture from which LLMs are derived
  • A plan for building an LLM from scratch

Large language models (LLMs), such as those offered in OpenAI’s ChatGPT, are deep neural network models that have been developed over the past few years. They ushered in a new era for natural language processing (NLP). Before the advent of LLMs, traditional methods excelled at categorization tasks such as email spam classification and straightforward pattern recognition that could be captured with handcrafted rules or simpler models. However, they typically underperformed in language ...

Get Build a Large Language Model (From Scratch) now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.