Transformers for Natural Language Processing - Second Edition

Book description

Learn how to use and implement transformers with Hugging Face and OpenAI (and others) by reading, running examples, investigating issues, asking the author questions, and interacting with our AI/ML community

Key Features

  • Pretrain a BERT-based model from scratch using Hugging Face
  • Fine-tune powerful transformer models, including OpenAI's GPT-3, to learn the logic of your data
  • Perform root cause analysis on hard NLP problems

Book Description

Transformers are...well...transforming the world of AI. There are many platforms and models out there, but which ones best suit your needs?

Transformers for Natural Language Processing, 2nd Edition, guides you through the world of transformers, highlighting the strengths of different models and platforms, while teaching you the problem-solving skills you need to tackle model weaknesses.

You'll use Hugging Face to pretrain a RoBERTa model from scratch, from building the dataset to defining the data collator to training the model.

If you're looking to fine-tune a pretrained model, including GPT-3, then Transformers for Natural Language Processing, 2nd Edition, shows you how with step-by-step guides.

The book investigates machine translations, speech-to-text, text-to-speech, question-answering, and many more NLP tasks. It provides techniques to solve hard language problems and may even help with fake news anxiety (read chapter 13 for more details).

You'll see how cutting-edge platforms, such as OpenAI, have taken transformers beyond language into computer vision tasks and code creation using Codex.

By the end of this book, you'll know how transformers work and how to implement them and resolve issues like an AI detective!

What you will learn

  • Find out how ViT and CLIP label images (including blurry ones!) and create images from a sentence using DALL-E
  • Discover new techniques to investigate complex language problems
  • Compare and contrast the results of GPT-3 against T5, GPT-2, and BERT-based transformers
  • Carry out sentiment analysis, text summarization, casual speech analysis, machine translations, and more using TensorFlow, PyTorch, and GPT-3
  • Measure the productivity of key transformers to define their scope, potential, and limits in production

Who this book is for

If you want to learn about and apply transformers to your natural language (and image) data, this book is for you.

You'll need a good understanding of Python and deep learning and a basic understanding of NLP to benefit most from this book. Many platforms covered in this book provide interactive user interfaces, which allow readers with a general interest in NLP and AI to follow several chapters. And, don't worry if you get stuck or have questions; this book gives you direct access to our AI/ML community and author, Denis Rothman. So, he'll be there to guide you on your transformers journey!

Publisher resources

Download Example Code

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. What are Transformers?
    1. The ecosystem of transformers
      1. Industry 4.0
      2. Foundation models
        1. Is programming becoming a sub-domain of NLP?
        2. The future of artificial intelligence specialists
    2. Optimizing NLP models with transformers
      1. The background of transformers
    3. What resources should we use?
      1. The rise of Transformer 4.0 seamless APIs
      2. Choosing ready-to-use API-driven libraries
      3. Choosing a Transformer Model
      4. The role of Industry 4.0 artificial intelligence specialists
    4. Summary
    5. Questions
    6. References
  3. Getting Started with the Architecture of the Transformer Model
    1. The rise of the Transformer: Attention is All You Need
      1. The encoder stack
        1. Input embedding
        2. Positional encoding
        3. Sublayer 1: Multi-head attention
        4. Sublayer 2: Feedforward network
      2. The decoder stack
        1. Output embedding and position encoding
        2. The attention layers
        3. The FFN sublayer, the post-LN, and the linear layer
    2. Training and performance
    3. Tranformer models in Hugging Face
    4. Summary
    5. Questions
    6. References
  4. Fine-Tuning BERT Models
    1. The architecture of BERT
      1. The encoder stack
        1. Preparing the pretraining input environment
        2. Pretraining and fine-tuning a BERT model
    2. Fine-tuning BERT
      1. Hardware constraints
      2. Installing the Hugging Face PyTorch interface for BERT
      3. Importing the modules
      4. Specifying CUDA as the device for torch
      5. Loading the dataset
      6. Creating sentences, label lists, and adding BERT tokens
      7. Activating the BERT tokenizer
      8. Processing the data
      9. Creating attention masks
      10. Splitting the data into training and validation sets
      11. Converting all the data into torch tensors
      12. Selecting a batch size and creating an iterator
      13. BERT model configuration
      14. Loading the Hugging Face BERT uncased base model
      15. Optimizer grouped parameters
      16. The hyperparameters for the training loop
      17. The training loop
      18. Training evaluation
      19. Predicting and evaluating using the holdout dataset
      20. Evaluating using the Matthews Correlation Coefficient
      21. The scores of individual batches
      22. Matthews evaluation for the whole dataset
    3. Summary
    4. Questions
    5. References
  5. Pretraining a RoBERTa Model from Scratch
    1. Training a tokenizer and pretraining a transformer
    2. Building KantaiBERT from scratch
      1. Step 1: Loading the dataset
      2. Step 2: Installing Hugging Face transformers
      3. Step 3: Training a tokenizer
      4. Step 4: Saving the files to disk
      5. Step 5: Loading the trained tokenizer files
      6. Step 6: Checking resource constraints: GPU and CUDA
      7. Step 7: Defining the configuration of the model
      8. Step 8: Reloading the tokenizer in transformers
      9. Step 9: Initializing a model from scratch
        1. Exploring the parameters
      10. Step 10: Building the dataset
      11. Step 11: Defining a data collator
      12. Step 12: Initializing the trainer
      13. Step 13: Pretraining the model
      14. Step 14: Saving the final model (+tokenizer + config) to disk
      15. Step 15: Language modeling with FillMaskPipeline
    3. Next steps
    4. Summary
    5. Questions
    6. References
  6. Downstream NLP Tasks with Transformers
    1. Transduction and the inductive inheritance of transformers
      1. The human intelligence stack
      2. The machine intelligence stack
    2. Transformer performances versus Human Baselines
      1. Evaluating models with metrics
        1. Accuracy score
        2. F1-score
        3. Matthews Correlation Coefficient (MCC)
      2. Benchmark tasks and datasets
        1. From GLUE to SuperGLUE
        2. Introducing higher Human Baselines standards
        3. The SuperGLUE evaluation process
      3. Defining the SuperGLUE benchmark tasks
        1. BoolQ
        2. Commitment Bank (CB)
        3. Multi-Sentence Reading Comprehension (MultiRC)
        4. Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)
        5. Recognizing Textual Entailment (RTE)
        6. Words in Context (WiC)
        7. The Winograd schema challenge (WSC)
    3. Running downstream tasks
      1. The Corpus of Linguistic Acceptability (CoLA)
      2. Stanford Sentiment TreeBank (SST-2)
      3. Microsoft Research Paraphrase Corpus (MRPC)
      4. Winograd schemas
    4. Summary
    5. Questions
    6. References
  7. Machine Translation with the Transformer
    1. Defining machine translation
      1. Human transductions and translations
      2. Machine transductions and translations
    2. Preprocessing a WMT dataset
      1. Preprocessing the raw data
      2. Finalizing the preprocessing of the datasets
    3. Evaluating machine translation with BLEU
      1. Geometric evaluations
      2. Applying a smoothing technique
        1. Chencherry smoothing
    4. Translation with Google Translate
    5. Translations with Trax
      1. Installing Trax
      2. Creating the original Transformer model
      3. Initializing the model using pretrained weights
      4. Tokenizing a sentence
      5. Decoding from the Transformer
      6. De-tokenizing and displaying the translation
    6. Summary
    7. Questions
    8. References
  8. The Rise of Suprahuman Transformers with GPT-3 Engines
    1. Suprahuman NLP with GPT-3 transformer models
    2. The architecture of OpenAI GPT transformer models
      1. The rise of billion-parameter transformer models
      2. The increasing size of transformer models
        1. Context size and maximum path length
      3. From fine-tuning to zero-shot models
      4. Stacking decoder layers
      5. GPT-3 engines
    3. Generic text completion with GPT-2
      1. Step 9: Interacting with GPT-2
    4. Training a custom GPT-2 language model
      1. Step 12: Interactive context and completion examples
    5. Running OpenAI GPT-3 tasks
      1. Running NLP tasks online
      2. Getting started with GPT-3 engines
        1. Running our first NLP task with GPT-3
        2. NLP tasks and examples
    6. Comparing the output of GPT-2 and GPT-3
    7. Fine-tuning GPT-3
      1. Preparing the data
        1. Step 1: Installing OpenAI
        2. Step 2: Entering the API key
        3. Step 3: Activating OpenAI’s data preparation module
      2. Fine-tuning GPT-3
        1. Step 4: Creating an OS environment
        2. Step 5: Fine-tuning OpenAI’s Ada engine
        3. Step 6: Interacting with the fine-tuned model
    8. The role of an Industry 4.0 AI specialist
      1. Initial conclusions
    9. Summary
    10. Questions
    11. References
  9. Applying Transformers to Legal and Financial Documents for AI Text Summarization
    1. Designing a universal text-to-text model
      1. The rise of text-to-text transformer models
      2. A prefix instead of task-specific formats
      3. The T5 model
    2. Text summarization with T5
      1. Hugging Face
        1. Hugging Face transformer resources
      2. Initializing the T5-large transformer model
        1. Getting started with T5
        2. Exploring the architecture of the T5 model
      3. Summarizing documents with T5-large
        1. Creating a summarization function
        2. A general topic sample
        3. The Bill of Rights sample
        4. A corporate law sample
    3. Summarization with GPT-3
    4. Summary
    5. Questions
    6. References
  10. Matching Tokenizers and Datasets
    1. Matching datasets and tokenizers
      1. Best practices
        1. Step 1: Preprocessing
        2. Step 2: Quality control
        3. Continuous human quality control
      2. Word2Vec tokenization
        1. Case 0: Words in the dataset and the dictionary
        2. Case 1: Words not in the dataset or the dictionary
        3. Case 2: Noisy relationships
        4. Case 3: Words in the text but not in the dictionary
        5. Case 4: Rare words
        6. Case 5: Replacing rare words
        7. Case 6: Entailment
    2. Standard NLP tasks with specific vocabulary
      1. Generating unconditional samples with GPT-2
      2. Generating trained conditional samples
      3. Controlling tokenized data
    3. Exploring the scope of GPT-3
    4. Summary
    5. Questions
    6. References
  11. Semantic Role Labeling with BERT-Based Transformers
    1. Getting started with SRL
      1. Defining semantic role labeling
        1. Visualizing SRL
      2. Running a pretrained BERT-based model
        1. The architecture of the BERT-based model
        2. Setting up the BERT SRL environment
    2. SRL experiments with the BERT-based model
    3. Basic samples
      1. Sample 1
      2. Sample 2
      3. Sample 3
    4. Difficult samples
      1. Sample 4
      2. Sample 5
      3. Sample 6
    5. Questioning the scope of SRL
      1. The limit of predicate analysis
      2. Redefining SRL
    6. Summary
    7. Questions
    8. References
  12. Let Your Data Do the Talking: Story, Questions, and Answers
    1. Methodology
      1. Transformers and methods
    2. Method 0: Trial and error
    3. Method 1: NER first
      1. Using NER to find questions
        1. Location entity questions
        2. Person entity questions
    4. Method 2: SRL first
      1. Question-answering with ELECTRA
      2. Project management constraints
      3. Using SRL to find questions
    5. Next steps
      1. Exploring Haystack with a RoBERTa model
      2. Exploring Q&A with a GTP-3 engine
    6. Summary
    7. Questions
    8. References
  13. Detecting Customer Emotions to Make Predictions
    1. Getting started: Sentiment analysis transformers
    2. The Stanford Sentiment Treebank (SST)
      1. Sentiment analysis with RoBERTa-large
    3. Predicting customer behavior with sentiment analysis
      1. Sentiment analysis with DistilBERT
      2. Sentiment analysis with Hugging Face’s models’ list
        1. DistilBERT for SST
        2. MiniLM-L12-H384-uncased
        3. RoBERTa-large-mnli
        4. BERT-base multilingual model
    4. Sentiment analysis with GPT-3
    5. Some Pragmatic I4.0 thinking before we leave
      1. Investigating with SRL
      2. Investigating with Hugging Face
      3. Investigating with the GPT-3 playground
        1. GPT-3 code
    6. Summary
    7. Questions
    8. References
  14. Analyzing Fake News with Transformers
    1. Emotional reactions to fake news
      1. Cognitive dissonance triggers emotional reactions
        1. Analyzing a conflictual Tweet
        2. Behavioral representation of fake news
    2. A rational approach to fake news
      1. Defining a fake news resolution roadmap
      2. The gun control debate
        1. Sentiment analysis
        2. Named entity recognition (NER)
        3. Semantic Role Labeling (SRL)
        4. Gun control SRL
        5. Reference sites
      3. COVID-19 and former President Trump’s Tweets
        1. Semantic Role Labeling (SRL)
    3. Before we go
    4. Summary
    5. Questions
    6. References
  15. Interpreting Black Box Transformer Models
    1. Transformer visualization with BertViz
      1. Running BertViz
        1. Step 1: Installing BertViz and importing the modules
        2. Step 2: Load the models and retrieve attention
        3. Step 3: Head view
        4. Step 4: Processing and displaying attention heads
        5. Step 5: Model view
    2. LIT
      1. PCA
      2. Running LIT
    3. Transformer visualization via dictionary learning
      1. Transformer factors
      2. Introducing LIME
      3. The visualization interface
    4. Exploring models we cannot access
    5. Summary
    6. Questions
    7. References
  16. From NLP to Task-Agnostic Transformer Models
    1. Choosing a model and an ecosystem
    2. The Reformer
      1. Running an example
    3. DeBERTa
      1. Running an example
    4. From Task-Agnostic Models to Vision Transformers
      1. ViT – Vision Transformers
        1. The Basic Architecture of ViT
        2. Vision transformers in code
      2. CLIP
        1. The Basic Architecture of CLIP
        2. CLIP in code
      3. DALL-E
        1. The Basic Architecture of DALL-E
        2. DALL-E in code
    5. An expanding universe of models
    6. Summary
    7. Questions
    8. References
  17. The Emergence of Transformer-Driven Copilots
    1. Prompt engineering
      1. Casual English with a meaningful context
      2. Casual English with a metonymy
      3. Casual English with an ellipsis
      4. Casual English with vague context
      5. Casual English with sensors
      6. Casual English with sensors but no visible context
      7. Formal English conversation with no context
      8. Prompt engineering training
    2. Copilots
      1. GitHub Copilot
      2. Codex
    3. Domain-specific GPT-3 engines
      1. Embedding2ML
        1. Step 1: Installing and importing OpenAI
        2. Step 2: Loading the dataset
        3. Step 3: Combining the columns
        4. Step 4: Running the GPT-3 embedding
        5. Step 5: Clustering (k-means clustering) with the embeddings
        6. Step 6: Visualizing the clusters (t-SNE)
      2. Instruct series
      3. Content filter
    4. Transformer-based recommender systems
      1. General-purpose sequences
      2. Dataset pipeline simulation with RL using an MDP
        1. Training customer behaviors with an MDP
        2. Simulating consumer behavior with an MDP
        3. Making recommendations
    5. Computer vision
    6. Humans and AI copilots in metaverses
      1. From looking at to being in
    7. Summary
    8. Questions
    9. References
  18. Appendix I — Terminology of Transformer Models
    1. Stack
    2. Sublayer
    3. Attention heads
  19. Appendix II — Hardware Constraints for Transformer Models
    1. The Architecture and Scale of Transformers
    2. Why GPUs are so special
    3. GPUs are designed for parallel computing
    4. GPUs are also designed for matrix multiplication
    5. Implementing GPUs in code
    6. Testing GPUs with Google Colab
    7. Google Colab Free with a CPU
      1. Google Colab Free with a GPU
    8. Google Colab Pro with a GPU
  20. Appendix III — Generic Text Completion with GPT-2
    1. Step 1: Activating the GPU
    2. Step 2: Cloning the OpenAI GPT-2 repository
    3. Step 3: Installing the requirements
    4. Step 4: Checking the version of TensorFlow
    5. Step 5: Downloading the 345M-parameter GPT-2 model
    6. Steps 6-7: Intermediate instructions
    7. Steps 7b-8: Importing and defining the model
    8. Step 9: Interacting with GPT-2
    9. References
  21. Appendix IV — Custom Text Completion with GPT-2
    1. Training a GPT-2 language model
      1. Step 1: Prerequisites
      2. Steps 2 to 6: Initial steps of the training process
      3. Step 7: The N Shepperd training files
      4. Step 8: Encoding the dataset
      5. Step 9: Training a GPT-2 model
      6. Step 10: Creating a training model directory
      7. Step 11: Generating unconditional samples
      8. Step 12: Interactive context and completion examples
    2. References
  22. Appendix V — Answers to the Questions
    1. Chapter 1, What are Transformers?
    2. Chapter 2, Getting Started with the Architecture of the Transformer Model
    3. Chapter 3, Fine-Tuning BERT Models
    4. Chapter 4, Pretraining a RoBERTa Model from Scratch
    5. Chapter 5, Downstream NLP Tasks with Transformers
    6. Chapter 6, Machine Translation with the Transformer
    7. Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines
    8. Chapter 8, Applying Transformers to Legal and Financial Documents for AI Text Summarization
    9. Chapter 9, Matching Tokenizers and Datasets
    10. Chapter 10, Semantic Role Labeling with BERT-Based Transformers
    11. Chapter 11, Let Your Data Do the Talking: Story, Questions, and Answers
    12. Chapter 12, Detecting Customer Emotions to Make Predictions
    13. Chapter 13, Analyzing Fake News with Transformers
    14. Chapter 14, Interpreting Black Box Transformer Models
    15. Chapter 15, From NLP to Task-Agnostic Transformer Models
    16. Chapter 16, The Emergence of Transformer-Driven Copilots
  23. Other Books You May Enjoy
  24. Index

Product information

  • Title: Transformers for Natural Language Processing - Second Edition
  • Author(s): Denis Rothman, Antonio Gulli
  • Release date: March 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803247335