Book description
The definitive guide to LLMs, from architectures, pretraining, and fine-tuning to Retrieval Augmented Generation (RAG), multimodal Generative AI, risks, and implementations with ChatGPT Plus with GPT-4, Hugging Face, and Vertex AI
Key Features
- Compare and contrast 20+ models (including GPT-4, BERT, and Llama 2) and multiple platforms and libraries to find the right solution for your project
- Apply RAG with LLMs using customized texts and embeddings
- Mitigate LLM risks, such as hallucinations, using moderation models and knowledge bases
- Purchase of the print or Kindle book includes a free eBook in PDF format
Book Description
Transformers for Natural Language Processing and Computer Vision, Third Edition, explores Large Language Model (LLM) architectures, applications, and various platforms (Hugging Face, OpenAI, and Google Vertex AI) used for Natural Language Processing (NLP) and Computer Vision (CV).
The book guides you through different transformer architectures to the latest Foundation Models and Generative AI. You’ll pretrain and fine-tune LLMs and work through different use cases, from summarization to implementing question-answering systems with embedding-based search techniques. You will also learn the risks of LLMs, from hallucinations and memorization to privacy, and how to mitigate such risks using moderation models with rule and knowledge bases. You’ll implement Retrieval Augmented Generation (RAG) with LLMs to improve the accuracy of your models and gain greater control over LLM outputs.
Dive into generative vision transformers and multimodal model architectures and build applications, such as image and video-to-text classifiers. Go further by combining different models and platforms and learning about AI agent replication.
This book provides you with an understanding of transformer architectures, pretraining, fine-tuning, LLM use cases, and best practices.
What you will learn
- Breakdown and understand the architectures of the Original Transformer, BERT, GPT models, T5, PaLM, ViT, CLIP, and DALL-E
- Fine-tune BERT, GPT, and PaLM 2 models
- Learn about different tokenizers and the best practices for preprocessing language data
- Pretrain a RoBERTa model from scratch
- Implement retrieval augmented generation and rules bases to mitigate hallucinations
- Visualize transformer model activity for deeper insights using BertViz, LIME, and SHAP
- Go in-depth into vision transformers with CLIP, DALL-E 2, DALL-E 3, and GPT-4V
Who this book is for
This book is ideal for NLP and CV engineers, software developers, data scientists, machine learning engineers, and technical leaders looking to advance their LLMs and generative AI skills or explore the latest trends in the field. Knowledge of Python and machine learning concepts is required to fully understand the use cases and code examples. However, with examples using LLM user interfaces, prompt engineering, and no-code model building, this book is great for anyone curious about the AI revolution.
Table of contents
- Preface
-
What Are Transformers?
- How constant time complexity O(1) changed our lives forever
- From one token to an AI revolution
- Foundation Models
- The role of AI professionals
- The rise of transformer seamless APIs and assistants
- Summary
- Questions
- References
- Further reading
- Getting Started with the Architecture of the Transformer Model
- Emergent vs Downstream Tasks: The Unseen Depths of Transformers
- Advancements in Translations with Google Trax, Google Translate, and Gemini
-
Diving into Fine-Tuning through BERT
- The architecture of BERT
-
Fine-tuning BERT
- Defining a goal
- Hardware constraints
- Installing Hugging Face Transformers
- Importing the modules
- Specifying CUDA as the device for torch
- Loading the CoLA dataset
- Creating sentences, label lists, and adding BERT tokens
- Activating the BERT tokenizer
- Processing the data
- Creating attention masks
- Splitting the data into training and validation sets
- Converting all the data into torch tensors
- Selecting a batch size and creating an iterator
- BERT model configuration
- Loading the Hugging Face BERT uncased base model
- Optimizer grouped parameters
- The hyperparameters for the training loop
- The training loop
- Training evaluation
- Predicting and evaluating using the holdout dataset
- Evaluating using the Matthews correlation coefficient
- Matthews correlation coefficient evaluation for the whole dataset
- Building a Python interface to interact with the model
- Summary
- Questions
- References
- Further reading
-
Pretraining a Transformer from Scratch through RoBERTa
- Training a tokenizer and pretraining a transformer
-
Building KantaiBERT from scratch
- Step 1: Loading the dataset
- Step 2: Installing Hugging Face transformers
- Step 3: Training a tokenizer
- Step 4: Saving the files to disk
- Step 5: Loading the trained tokenizer files
- Step 6: Checking resource constraints: GPU and CUDA
- Step 7: Defining the configuration of the model
- Step 8: Reloading the tokenizer in transformers
- Step 9: Initializing a model from scratch
- Step 10: Building the dataset
- Step 11: Defining a data collator
- Step 12: Initializing the trainer
- Step 13: Pretraining the model
- Step 14: Saving the final model (+tokenizer + config) to disk
- Step 15: Language modeling with FillMaskPipeline
-
Pretraining a Generative AI customer support model on X data
- Step 1: Downloading the dataset
- Step 2: Installing Hugging Face transformers
- Step 3: Loading and filtering the data
- Step 4: Checking Resource Constraints: GPU and CUDA
- Step 5: Defining the configuration of the model
- Step 6: Creating and processing the dataset
- Step 7: Initializing the trainer
- Step 8: Pretraining the model
- Step 9: Saving the model
- Step 10: User interface to chat with the Generative AI agent
- Further pretraining
- Limitations
- Next steps
- Summary
- Questions
- References
- Further reading
-
The Generative AI Revolution with ChatGPT
- GPTs as GPTs
- The architecture of OpenAI GPT transformer models
-
OpenAI models as assistants
- ChatGPT provides source code
- GitHub Copilot code assistant
- General-purpose prompt examples
-
Getting started with ChatGPT – GPT-4 as an assistant
- 1. GPT-4 helps to explain how to write source code
- 2. GPT-4 creates a function to show the YouTube presentation of GPT-4 by Greg Brockman on March 14, 2023
- 3. GPT-4 creates an application for WikiArt to display images
- 4. GPT-4 creates an application to display IMDb reviews
- 5. GPT-4 creates an application to display a newsfeed
- 6. GPT-4 creates a k-means clustering (KMC) algorithm
- Getting started with the GPT-4 API
- Retrieval Augmented Generation (RAG) with GPT-4
- Summary
- Questions
- References
- Further reading
- Fine-Tuning OpenAI GPT Models
-
Shattering the Black Box with Interpretable Tools
-
Transformer visualization with BertViz
-
Running BertViz
- Step 1: Installing BertViz and importing the modules
- Step 2: Load the models and retrieve attention
- Step 3: Head view
- Step 4: Processing and displaying attention heads
- Step 5: Model view
- Step 6: Displaying the output probabilities of attention heads
- Streaming the output of the attention heads
- Visualizing word relationships using attention scores with pandas
- exBERT
-
Running BertViz
- Interpreting Hugging Face transformers with SHAP
- Transformer visualization via dictionary learning
- Other interpretable AI tools
- Summary
- Questions
- References
- Further reading
-
Transformer visualization with BertViz
- Investigating the Role of Tokenizers in Shaping Transformer Models
-
Leveraging LLM Embeddings as an Alternative to Fine-Tuning
- LLM embeddings as an alternative to fine-tuning
- Fundamentals of text embedding with NLKT and Gensim
- Implementing question-answering systems with embedding-based search techniques
- Transfer learning with Ada embeddings
- Summary
- Questions
- References
- Further reading
- Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4
-
Summarization with T5 and ChatGPT
- Designing a universal text-to-text model
- The rise of text-to-text transformer models
- A prefix instead of task-specific formats
- The T5 model
- Text summarization with T5
- From text-to-text to new word predictions with OpenAI ChatGPT
- Summary
- Questions
- References
- Further reading
- Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2
- Guarding the Giants: Mitigating Risks in Large Language Models
-
Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
- From task-agnostic models to multimodal vision transformers
- ViT – Vision Transformer
- CLIP
- DALL-E 2 and DALL-E 3
- GPT-4V, DALL-E 3, and divergent semantic association
- Summary
- Questions
- References
- Further Reading
- Transcending the Image-Text Boundary with Stable Diffusion
-
Hugging Face AutoTrain: Training Vision Models without Coding
- Goal and scope of this chapter
- Getting started
- Uploading the dataset
- Training models with AutoTrain
- Deploying a model
- Running our models for inference
- Summary
- Questions
- References
- Further reading
- On the Road to Functional AGI with HuggingGPT and its Peers
-
Beyond Human-Designed Prompts with Generative Ideation
- Part I: Defining generative ideation
- Part II: Automating prompt design for generative image design
- Part III: Automated generative ideation with Stable Diffusion
- The future is yours!
- Summary
- Questions
- References
- Further reading
- Appendix: Answers to the Questions
- Other Books You May Enjoy
- Index
Product information
- Title: Transformers for Natural Language Processing and Computer Vision - Third Edition
- Author(s):
- Release date: February 2024
- Publisher(s): Packt Publishing
- ISBN: 9781805128724
You might also like
book
Transformers for Natural Language Processing
Publisher's Note: A new edition of this book is out now that includes working with GPT-3 …
book
Transformers for Natural Language Processing - Second Edition
OpenAI's GPT-3, ChatGPT, GPT-4 and Hugging Face transformers for language tasks in one book. Get a …
book
Natural Language Processing with Transformers, Revised Edition
Since their introduction in 2017, transformers have quickly become the dominant architecture for achieving state-of-the-art results …
book
Modern Computer Vision with PyTorch
Get to grips with deep learning techniques for building image processing applications using PyTorch with the …