Transformers for Natural Language Processing

Book description

Publisher's Note: A new edition of this book is out now that includes working with GPT-3 and comparing the results with other models. It includes even more use cases, such as casual language analysis and computer vision tasks, as well as an introduction to OpenAI's Codex.

Key Features

  • Build and implement state-of-the-art language models, such as the original Transformer, BERT, T5, and GPT-2, using concepts that outperform classical deep learning models
  • Go through hands-on applications in Python using Google Colaboratory Notebooks with nothing to install on a local machine
  • Test transformer models on advanced use cases

Book Description

The transformer architecture has proved to be revolutionary in outperforming the classical RNN and CNN models in use today. With an apply-as-you-learn approach, Transformers for Natural Language Processing investigates in vast detail the deep learning for machine translations, speech-to-text, text-to-speech, language modeling, question answering, and many more NLP domains with transformers.

The book takes you through NLP with Python and examines various eminent models and datasets within the transformer architecture created by pioneers such as Google, Facebook, Microsoft, OpenAI, and Hugging Face.

The book trains you in three stages. The first stage introduces you to transformer architectures, starting with the original transformer, before moving on to RoBERTa, BERT, and DistilBERT models. You will discover training methods for smaller transformers that can outperform GPT-3 in some cases. In the second stage, you will apply transformers for Natural Language Understanding (NLU) and Natural Language Generation (NLG). Finally, the third stage will help you grasp advanced language understanding techniques such as optimizing social network datasets and fake news identification.

By the end of this NLP book, you will understand transformers from a cognitive science perspective and be proficient in applying pretrained transformer models by tech giants to various datasets.

What you will learn

  • Use the latest pretrained transformer models
  • Grasp the workings of the original Transformer, GPT-2, BERT, T5, and other transformer models
  • Create language understanding Python programs using concepts that outperform classical deep learning models
  • Use a variety of NLP platforms, including Hugging Face, Trax, and AllenNLP
  • Apply Python, TensorFlow, and Keras programs to sentiment analysis, text summarization, speech recognition, machine translations, and more
  • Measure the productivity of key transformers to define their scope, potential, and limits in production

Who this book is for

Since the book does not teach basic programming, you must be familiar with neural networks, Python, PyTorch, and TensorFlow in order to learn their implementation with Transformers. Readers who can benefit the most from this book include experienced deep learning & NLP practitioners and data analysts & data scientists who want to process the increasing amounts of language-driven data.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. Getting Started with the Model Architecture of the Transformer
    1. The background of the Transformer
    2. The rise of the Transformer: Attention Is All You Need
      1. The encoder stack
        1. Input embedding
        2. Positional encoding
        3. Sub-layer 1: Multi-head attention
        4. Sub-layer 2: Feedforward network
      2. The decoder stack
        1. Output embedding and position encoding
        2. The attention layers
        3. The FFN sub-layer, the Post-LN, and the linear layer
    3. Training and performance
      1. Before we end the chapter
    4. Summary
    5. Questions
    6. References
  3. Fine-Tuning BERT Models
    1. The architecture of BERT
      1. The encoder stack
        1. Preparing the pretraining input environment
      2. Pretraining and fine-tuning a BERT model
    2. Fine-tuning BERT
      1. Activating the GPU
      2. Installing the Hugging Face PyTorch interface for BERT
      3. Importing the modules
      4. Specifying CUDA as the device for torch
      5. Loading the dataset
      6. Creating sentences, label lists, and adding BERT tokens
      7. Activating the BERT tokenizer
      8. Processing the data
      9. Creating attention masks
      10. Splitting data into training and validation sets
      11. Converting all the data into torch tensors
      12. Selecting a batch size and creating an iterator
      13. BERT model configuration
      14. Loading the Hugging Face BERT uncased base model
      15. Optimizer grouped parameters
      16. The hyperparameters for the training loop
      17. The training loop
      18. Training evaluation
      19. Predicting and evaluating using the holdout dataset
      20. Evaluating using Matthews Correlation Coefficient
      21. The score of individual batches
      22. Matthews evaluation for the whole dataset
    3. Summary
    4. Questions
    5. References
  4. Pretraining a RoBERTa Model from Scratch
    1. Training a tokenizer and pretraining a transformer
    2. Building KantaiBERT from scratch
      1. Step 1: Loading the dataset
      2. Step 2: Installing Hugging Face transformers
      3. Step 3: Training a tokenizer
      4. Step 4: Saving the files to disk
      5. Step 5: Loading the trained tokenizer files
      6. Step 6: Checking resource constraints: GPU and CUDA
      7. Step 7: Defining the configuration of the model
      8. Step 8: Reloading the tokenizer in transformers
      9. Step 9: Initializing a model from scratch
        1. Exploring the parameters
      10. Step 10: Building the dataset
      11. Step 11: Defining a data collator
      12. Step 12: Initializing the trainer
      13. Step 13: Pretraining the model
      14. Step 14: Saving the final model (+tokenizer + config) to disk
      15. Step 15: Language modeling with FillMaskPipeline
    3. Next steps
    4. Summary
    5. Questions
    6. References
  5. Downstream NLP Tasks with Transformers
    1. Transduction and the inductive inheritance of transformers
      1. The human intelligence stack
      2. The machine intelligence stack
    2. Transformer performances versus Human Baselines
      1. Evaluating models with metrics
        1. Accuracy score
        2. F1-score
        3. Matthews Correlation Coefficient (MCC)
      2. Benchmark tasks and datasets
        1. From GLUE to SuperGLUE
        2. Introducing higher Human Baseline standards
        3. The SuperGLUE evaluation process
      3. Defining the SuperGLUE benchmark tasks
        1. BoolQ
        2. Commitment Bank (CB)
        3. Multi-Sentence Reading Comprehension (MultiRC)
        4. Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)
        5. Recognizing Textual Entailment (RTE)
        6. Words in Context (WiC)
        7. The Winograd Schema Challenge (WSC)
    3. Running downstream tasks
      1. The Corpus of Linguistic Acceptability (CoLA)
      2. Stanford Sentiment TreeBank (SST-2)
      3. Microsoft Research Paraphrase Corpus (MRPC)
      4. Winograd schemas
    4. Summary
    5. Questions
    6. References
  6. Machine Translation with the Transformer
    1. Defining machine translation
      1. Human transductions and translations
      2. Machine transductions and translations
    2. Preprocessing a WMT dataset
      1. Preprocessing the raw data
      2. Finalizing the preprocessing of the datasets
    3. Evaluating machine translation with BLEU
      1. Geometric evaluations
      2. Applying a smoothing technique
        1. Chencherry smoothing
    4. Translations with Trax
      1. Installing Trax
      2. Creating a Transformer model
      3. Initializing the model using pretrained weights
      4. Tokenizing a sentence
      5. Decoding from the Transformer
      6. De-tokenizing and displaying the translation
    5. Summary
    6. Questions
    7. References
  7. Text Generation with OpenAI GPT-2 and GPT-3 Models
    1. The rise of billion-parameter transformer models
      1. The increasing size of transformer models
        1. Context size and maximum path length
    2. Transformers, reformers, PET, or GPT?
      1. The limits of the original Transformer architecture
        1. Running BertViz
      2. The Reformer
      3. Pattern-Exploiting Training (PET)
        1. The philosophy of Pattern-Exploiting Training (PET)
    3. It's time to make a decision
    4. The architecture of OpenAI GPT models
      1. From fine-tuning to zero-shot models
      2. Stacking decoder layers
    5. Text completion with GPT-2
      1. Step 1: Activating the GPU
      2. Step 2: Cloning the OpenAI GPT-2 repository
      3. Step 3: Installing the requirements
      4. Step 4: Checking the version of TensorFlow
      5. Step 5: Downloading the 345M parameter GPT-2 model
      6. Steps 6-7: Intermediate instructions
      7. Steps 7b-8: Importing and defining the model
      8. Step 9: Interacting with GPT-2
    6. Training a GPT-2 language model
      1. Step 1: Prerequisites
      2. Steps 2 to 6: Initial steps of the training process
      3. Step 7: The N Shepperd training files
      4. Step 8: Encoding the dataset
      5. Step 9: Training the model
      6. Step 10: Creating a training model directory
    7. Context and completion examples
    8. Generating music with transformers
    9. Summary
    10. Questions
    11. References
  8. Applying Transformers to Legal and Financial Documents for AI Text Summarization
    1. Designing a universal text-to-text model
      1. The rise of text-to-text transformer models
      2. A prefix instead of task-specific formats
      3. The T5 model
    2. Text summarization with T5
      1. Hugging Face
        1. Hugging Face transformer resources
      2. Initializing the T5-large transformer model
        1. Getting started with T5
        2. Exploring the architecture of the T5 model
      3. Summarizing documents with T5-large
        1. Creating a summarization function
        2. A general topic sample
        3. The Bill of Rights sample
        4. A corporate law sample
    3. Summary
    4. Questions
    5. References
  9. Matching Tokenizers and Datasets
    1. Matching datasets and tokenizers
      1. Best practices
        1. Step 1: Preprocessing
        2. Step 2: Post-processing
        3. Continuous human quality control
      2. Word2Vec tokenization
        1. Case 0: Words in the dataset and the dictionary
        2. Case 1: Words not in the dataset or the dictionary
        3. Case 2: Noisy relationships
        4. Case 3: Rare words
        5. Case 4: Replacing rare words
        6. Case 5: Entailment
    2. Standard NLP tasks with specific vocabulary
      1. Generating unconditional samples with GPT-2
        1. Controlling tokenized data
      2. Generating trained conditional samples
    3. T5 Bill of Rights Sample
      1. Summarizing the Bill of Rights, version 1
      2. Summarizing the Bill of Rights, version 2
    4. Summary
    5. Questions
    6. References
  10. Semantic Role Labeling with BERT-Based Transformers
    1. Getting started with SRL
      1. Defining Semantic Role Labeling
        1. Visualizing SRL
      2. Running a pretrained BERT-based model
        1. The architecture of the BERT-based model
        2. Setting up the BERT SRL environment
    2. SRL experiments with the BERT-based model
    3. Basic samples
      1. Sample 1
      2. Sample 2
      3. Sample 3
    4. Difficult samples
      1. Sample 4
      2. Sample 5
      3. Sample 6
    5. Summary
    6. Questions
    7. References
  11. Let Your Data Do the Talking: Story, Questions, and Answers
    1. Methodology
      1. Transformers and methods
    2. Method 0: Trial and error
    3. Method 1: NER first
      1. Using NER to find questions
        1. Location entity questions
        2. Person entity questions
    4. Method 2: SRL first
      1. Question-answering with ELECTRA
      2. Project management constraints
      3. Using SRL to find questions
    5. Next steps
      1. Exploring Haystack with a RoBERTa model
    6. Summary
    7. Questions
    8. References
  12. Detecting Customer Emotions to Make Predictions
    1. Getting started: Sentiment analysis transformers
    2. The Stanford Sentiment Treebank (SST)
      1. Sentiment analysis with RoBERTa-large
    3. Predicting customer behavior with sentiment analysis
      1. Sentiment analysis with DistilBERT
      2. Sentiment analysis with Hugging Face's models list
        1. DistilBERT for SST
        2. MiniLM-L12-H384-uncased
        3. RoBERTa-large-mnli
        4. BERT-base multilingual model
    4. Summary
    5. Questions
    6. References
  13. Analyzing Fake News with Transformers
    1. Emotional reactions to fake news
      1. Cognitive dissonance triggers emotional reactions
        1. Analyzing a conflictual Tweet
        2. Behavioral representation of fake news
    2. A rational approach to fake news
      1. Defining a fake news resolution roadmap
      2. Gun control
        1. Sentiment analysis
        2. Named entity recognition (NER)
        3. Semantic Role Labeling (SRL)
        4. Reference sites
      3. COVID-19 and former President Trump's Tweets
        1. Semantic Role Labeling (SRL)
    3. Before we go
      1. Looking for the silver bullet
      2. Looking for reliable training methods
    4. Summary
    5. Questions
    6. References
  14. Appendix: Answers to the Questions
    1. Chapter 1, Getting Started with the Model Architecture of the Transformer
    2. Chapter 2, Fine-Tuning BERT Models
    3. Chapter 3, Pretraining a RoBERTa Model from Scratch
    4. Chapter 4, Downstream NLP Tasks with Transformers
    5. Chapter 5, Machine Translation with the Transformer
    6. Chapter 6, Text Generation with OpenAI GPT-2 and GPT-3 Models
    7. Chapter 7, Applying Transformers to Legal and Financial Documents for AI Text Summarization
    8. Chapter 8, Matching Tokenizers and Datasets
    9. Chapter 9, Semantic Role Labeling with BERT-Based Transformers
    10. Chapter 10, Let Your Data Do the Talking: Story, Questions, and Answers
    11. Chapter 11, Detecting Customer Emotions to Make Predictions
    12. Chapter 12, Analyzing Fake News with Transformers
  15. Other Books You May Enjoy
  16. Index

Product information

  • Title: Transformers for Natural Language Processing
  • Author(s): Denis Rothman
  • Release date: January 2021
  • Publisher(s): Packt Publishing
  • ISBN: 9781800565791