book

Transformers for Natural Language Processing and Computer Vision - Third Edition

Name: Transformers for Natural Language Processing and Computer Vision - Third Edition
Author: Denis Rothman
ISBN: 9781805128724

by Denis Rothman

February 2024

Intermediate to advanced

730 pages

17h 59m

English

Packt Publishing

Read now

Unlock full access

Preface
Who this book is forWhat this book coversPart I: The Foundations of TransformersPart II: The Rise of Suprahuman NLPPart III: Generative Computer Vision: A New Way to See the WorldTo get the most out of this bookGet in touchMaking the Most Out of This Book – Get to Know Your Free BenefitsUnlock Your Book’s Exclusive BenefitsHow to unlock these benefits in three easy stepsNeed help?
What Are Transformers?
Foundation ModelsFrom general-purpose to specific tasksA brief history of how transformers were bornFrom one token to an AI revolutionThe new role of AI professionalsThe future of AI professionalsWhat resources should we use?Decision-making guidelinesThe rise of seamless transformer APIsChoosing ready-to-use API-driven librariesChoosing a cloud platform and transformer modelSummaryQuestionsReferencesFurther reading
Getting Started with the Architecture of the Transformer Model
The rise of the Transformer: Attention Is All You NeedThe encoder stackInput embeddingPositional encodingSublayer 1: Multi-head attentionSublayer 2: Feedforward networkThe decoder stackOutput embedding and position encodingThe attention layersThe FFN sublayer, the post-LN, and the linear layerTraining and performanceHugging Face transformer modelsSummaryQuestionsReferencesFurther reading
Emergent vs Downstream Tasks: The Unseen Depths of Transformers
The paradigm shift: What is an NLP task?Inside the head of the attention sublayer of a transformerExploring emergence with ChatGPTInvestigating the potential of downstream tasksEvaluating models with metricsAccuracy scoreF1-scoreMCCHuman evaluationBenchmark tasks and datasetsDefining the SuperGLUE benchmark tasksRunning downstream tasksThe Corpus of Linguistic Acceptability (CoLA)Stanford Sentiment TreeBank (SST-2)Microsoft Research Paraphrase Corpus (MRPC)Winograd schemasSummaryQuestionsReferencesFurther reading
Advancements in Translations with Google Trax, Google Translate, and Gemini
Defining machine translationHuman transductions and translationsMachine transductions and translationsEvaluating machine translationsPreprocessing a WMT datasetPreprocessing the raw dataFinalizing the preprocessing of the datasetsEvaluating machine translations with BLEUGeometric evaluationsApplying a smoothing techniqueTranslations with Google TraxInstalling TraxCreating the Original Transformer modelInitializing the model using pretrained weightsTokenizing a sentenceDecoding from the TransformerDe-tokenizing and displaying the translationTranslation with Google TranslateTranslation with a Google Translate AJAX API WrapperImplementing googletransTranslation with GeminiGemini’s potentialSummaryQuestionsReferencesFurther reading
Diving into Fine-Tuning through BERT
The architecture of BERTThe encoder stackPreparing the pretraining input environmentPretraining and fine-tuning a BERT modelFine-tuning BERTDefining a goalHardware constraintsInstalling Hugging Face TransformersImporting the modulesSpecifying CUDA as the device for torchLoading the CoLA datasetCreating sentences, label lists, and adding BERT tokensActivating the BERT tokenizerProcessing the dataCreating attention masksSplitting the data into training and validation setsConverting all the data into torch tensorsSelecting a batch size and creating an iteratorBERT model configurationLoading the Hugging Face BERT uncased base modelOptimizer grouped parametersThe hyperparameters for the training loopThe training loopTraining evaluationPredicting and evaluating using the holdout datasetExploring the prediction processEvaluating using the Matthews correlation coefficientMatthews correlation coefficient evaluation for the whole datasetBuilding a Python interface to interact with the modelSaving the modelCreating an interface for the trained modelInteracting with the modelSummaryQuestionsReferencesFurther reading
Pretraining a Transformer from Scratch through RoBERTa
Training a tokenizer and pretraining a transformerBuilding KantaiBERT from scratchStep 1: Loading the datasetStep 2: Installing Hugging Face transformersStep 3: Training a tokenizerStep 4: Saving the files to diskStep 5: Loading the trained tokenizer filesStep 6: Checking resource constraints: GPU and CUDAStep 7: Defining the configuration of the modelStep 8: Reloading the tokenizer in transformersStep 9: Initializing a model from scratchExploring the parametersStep 10: Building the datasetStep 11: Defining a data collatorStep 12: Initializing the trainerStep 13: Pretraining the modelStep 14: Saving the final model (+tokenizer + config) to diskStep 15: Language modeling with FillMaskPipelinePretraining a Generative AI customer support model on X dataStep 1: Downloading the datasetStep 2: Installing Hugging Face transformersStep 3: Loading and filtering the dataStep 4: Checking Resource Constraints: GPU and CUDAStep 5: Defining the configuration of the modelStep 6: Creating and processing the datasetStep 7: Initializing the trainerStep 8: Pretraining the modelStep 9: Saving the modelStep 10: User interface to chat with the Generative AI agentFurther pretraining LimitationsNext stepsSummaryQuestionsReferencesFurther reading
The Generative AI Revolution with ChatGPT
GPTs as GPTsImprovementDiffusionNew application sectorsSelf-service assistantsDevelopment assistantsPervasivenessThe architecture of OpenAI GPT transformer modelsThe rise of billion-parameter transformer modelsThe increasing size of transformer modelsContext size and maximum path lengthFrom fine-tuning to zero-shot modelsStacking decoder layersGPT modelsOpenAI models as assistantsChatGPT provides source codeGitHub Copilot code assistantGeneral-purpose prompt examplesGetting started with ChatGPT – GPT-4 as an assistant1. GPT-4 helps to explain how to write source code2. GPT-4 creates a function to show the YouTube presentation of GPT-4 by Greg Brockman on March 14, 20233. GPT-4 creates an application for WikiArt to display images4. GPT-4 creates an application to display IMDb reviews5. GPT-4 creates an application to display a newsfeed6. GPT-4 creates a k-means clustering (KMC) algorithmGetting started with the GPT-4 APIRunning our first NLP task with GPT-4Steps 1: Installing OpenAI and Step 2: Entering the API keyStep 3: Running an NLP task with GPT-4Key hyperparametersRunning multiple NLP tasksRetrieval Augmented Generation (RAG) with GPT-4InstallationDocument retrievalAugmented retrieval generationSummaryQuestionsReferencesFurther reading
Fine-Tuning OpenAI GPT Models
Risk managementFine-tuning a GPT model for completion (generative)1. Preparing the dataset1.1. Preparing the data in JSON1.2. Converting the data to JSONL2. Fine-tuning an original model3. Running the fine-tuned GPT model4. Managing fine-tuned jobs and modelsBefore leavingSummaryQuestionsReferencesFurther reading
Shattering the Black Box with Interpretable Tools
Transformer visualization with BertVizRunning BertVizStep 1: Installing BertViz and importing the modulesStep 2: Load the models and retrieve attentionStep 3: Head viewStep 4: Processing and displaying attention headsStep 5: Model viewStep 6: Displaying the output probabilities of attention headsStreaming the output of the attention headsVisualizing word relationships using attention scores with pandasexBERTInterpreting Hugging Face transformers with SHAPIntroducing SHAPExplaining Hugging Face outputs with SHAPTransformer visualization via dictionary learningTransformer factorsIntroducing LIMEThe visualization interfaceOther interpretable AI toolsLITPCARunning LITOpenAI LLMs explain neurons in transformers Limitations and human controlSummaryQuestionsReferencesFurther reading

Investigating the Role of Tokenizers in Shaping Transformer Models
Matching datasets and tokenizersBest practicesStep 1: PreprocessingStep 2: Quality controlStep 3: Continuous human quality controlWord2Vec tokenizationCase 0: Words in the dataset and the dictionaryCase 1: Words not in the dataset or the dictionaryCase 2: Noisy relationshipsCase 3: Words in a text but not in the dictionaryCase 4: Rare wordsCase 5: Replacing rare wordsExploring sentence and WordPiece tokenizers to understand the efficiency of subword tokenizers for transformersWord and sentence tokenizersSentence tokenizationWord tokenizationRegular expression tokenizationTreebank tokenizationWhite space tokenizationPunkt tokenizationWord punctuation tokenizationMulti-word tokenizationSubword tokenizersUnigram language model tokenizationSentencePieceByte-Pair Encoding (BPE)WordPieceExploring in codeDetecting the type of tokenizerDisplaying token-ID mappingsAnalyzing and controlling the quality of token-ID mappingsSummaryQuestionsReferencesFurther reading
Leveraging LLM Embeddings as an Alternative to Fine-Tuning
LLM embeddings as an alternative to fine-tuningFrom prompt design to prompt engineeringFundamentals of text embedding with NLKT and GensimInstalling libraries1. Reading the text file2. Tokenizing the text with PunktPreprocessing the tokens3. Embedding with Gensim and Word2Vec4. Model description5. Accessing a word and vector6. Exploring Gensim’s vector space7. TensorFlow ProjectorImplementing question-answering systems with embedding-based search techniques1. Installing the libraries and selecting the models2. Implementing the embedding model and the GPT model2.1 Evaluating the model with a knowledge base: GPT can answer questions2.2 Add a knowledge base2.3 Evaluating the model without a knowledge base: GPT cannot answer questions3. Prepare search data4. Search5. Ask5.1.Example question5.2.Troubleshooting wrong answersTransfer learning with Ada embeddings1. The Amazon Fine Food Reviews dataset1.2. Data preparation2. Running Ada embeddings and saving them for future reuse3. Clustering3.1. Find the clusters using k-means clustering3.2. Display clusters with t-SNE4. Text samples in the clusters and naming the clustersSummaryQuestionsReferencesFurther reading
Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4
Getting started with cutting-edge SRLEntering the syntax-free world of AIDefining SRLVisualizing SRLSRL experiments with ChatGPT with GPT-4Basic sampleDifficult sampleQuestioning the scope of SRLThe challenges of predicate analysisRedefining SRLFrom task-specific SRL to emergence with ChatGPT1. Installing OpenAI2. GPT-4 dialog function3. SRLSample 1 (basic)Sample 2 (basic)Sample 3 (basic)Sample 4 (difficult)Sample 5 (difficult)Sample 6 (difficult)SummaryQuestionsReferencesFurther reading
Summarization with T5 and ChatGPT
Designing a universal text-to-text modelThe rise of text-to-text transformer modelsA prefix instead of task-specific formatsThe T5 modelText summarization with T5Hugging FaceSelecting a Hugging Face transformer modelInitializing the T5-large transformer modelGetting started with T5Exploring the architecture of the T5 modelSummarizing documents with T5-largeCreating a summarization functionA general topic sampleThe Bill of Rights sampleA corporate law sampleFrom text-to-text to new word predictions with OpenAI ChatGPTComparing T5 and ChatGPT’s summarization methodsPretrainingSpecific versus non-specific tasksSummarization with ChatGPTSummaryQuestionsReferencesFurther reading
Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2
ArchitecturePathwaysClientResource managerIntermediate representationCompilerSchedulerExecutorPaLMParallel layer processing that increases training speedShared input-output embeddings, which saves memoryNo biases, which improves training stabilityRotary Positional Embedding (RoPE) improves model qualitySwiGLU activations improve model qualityPaLM 2Improved performance, faster, and more efficientScaling laws, optimal model size, and the number of parametersState-of-the-art (SOA) performance and a new training methodologyAssistantsGeminiGoogle WorkspaceGoogle Colab CopilotVertex AI PaLM 2 interfaceVertex AI PaLM 2 assistantVertex AI PaLM 2 APIQuestion answeringQuestion-answer taskSummarization of a conversationSentiment analysisMulti-choice problemsCodeFine-tuningCreating a bucketFine-tuning the modelSummaryQuestionsReferencesFurther reading
Guarding the Giants: Mitigating Risks in Large Language Models
The emergence of functional AGICutting-edge platform installation limitationsAuto-BIG-benchWandBWhen will AI agents replicate?Function: `create_vocab`Process:Function: `scrape_wikipedia`Process:Function: `create_dataset`Process:Classes: `TextDataset`, `Encoder`, and `Decoder`Function: `count_parameters`Function: `main`Process:Saving and Executing the ModelRisk managementHallucinations and memorizationMemorizationRisky emergent behaviorsDisinformationInfluence operationsHarmful contentPrivacyCybersecurityRisk mitigation tools with RLHF and RAG1. Input and output moderation with transformers and a rule base2. Building a knowledge base for ChatGPT and GPT-4Adding keywords3. Parsing the user requests and accessing the KB4. Generating ChatGPT content with a dialog functionToken controlModerationSummaryQuestionsReferencesFurther reading
Beyond Text: Vision Transformers in the Dawn of Revolutionary AI
From task-agnostic models to multimodal vision transformersViT – Vision TransformerThe basic architecture of ViTStep 1: Splitting the image into patchesStep 2: Building a vocabulary of image patchesStep 3: The transformerVision transformers in codeA feature extractor simulatorThe transformerConfiguration and shapesCLIPThe basic architecture of CLIPCLIP in codeDALL-E 2 and DALL-E 3The basic architecture of DALL-EGetting started with the DALL-E 2 and DALL-E 3 APICreating a new imageCreating a variation of an imageFrom research to mainstream AI with DALL-EGPT-4V, DALL-E 3, and divergent semantic associationDefining divergent semantic associationCreating an image with ChatGPT Plus with DALL-EImplementing the GPT-4V API and experimenting with DATExample 1: A standard image and textExample 2: Divergent semantic association, moderate divergenceExample 3: Divergent semantic association, high divergenceSummaryQuestionsReferencesFurther Reading
Transcending the Image-Text Boundary with Stable Diffusion
Transcending image generation boundaries Part I: Defining text-to-image with Stable Diffusion1. Text embedding using a transformer encoder2. Random image creation with noise3. Stable Diffusion model downsampling4. Decoder upsampling5. Output imageRunning the Keras Stable Diffusion implementationPart II: Running text-to-image with Stable DiffusionGenerative AI Stable Diffusion for a Divergent Association Task (DAT)Part III: VideoText-to-video with Stability AI animationText-to-video, with a variation of OpenAI CLIPA video-to-text model with TimeSformerPreparing the video framesPutting the TimeSformer to work to make predictions on the video framesSummaryQuestionsReferencesFurther reading
Hugging Face AutoTrain: Training Vision Models without Coding
Goal and scope of this chapterGetting startedUploading the datasetNo coding?Training models with AutoTrainDeploying a modelRunning our models for inferenceRetrieving validation imagesThe program will now attempt to classify the validation images. We will see how a vision transformer reacts to this image.Inference: image classificationValidation experimentation on the trained modelsViTForImageClassificationSwinForImageClassification 1BeitForImage ClassificationSwinForImageClassification 2ConvNextForImageClassificationResNetForImageClassificationTrying the top ViT model with a corpusSummaryQuestionsReferencesFurther reading
On the Road to Functional AGI with HuggingGPT and its Peers
Defining F-AGIInstalling and importingValidation setLevel 1 image: easyLevel 2 image: difficultLevel 3 image: very difficultHuggingGPTLevel 1: EasyLevel 2: DifficultLevel 3: Very difficultCustomGPTGoogle Cloud VisionLevel 1: EasyLevel 2: DifficultLevel 3: Very difficultModel chaining: Chaining Google Cloud Vision to ChatGPTModel Chaining with Runway Gen-2Midjourney: Imagine a ship in the galaxyGen-2: Make this ship sail the seaSummaryQuestionsReferencesFurther Reading
Beyond Human-Designed Prompts with Generative Ideation
Part I: Defining generative ideationAutomated ideation architectureScope and limitationsPart II: Automating prompt design for generative image designChatGPT/GPT-4 HTML presentationChatGPT with GPT-4 provides the text for the presentationChatGPT with GPT-4 provides a graph in HTML to illustrate the presentationLlama 2A brief introduction to Llama 2Implementing Llama 2 with Hugging FaceMidjourneyDiscord API for MidjourneyMicrosoft DesignerPart III: Automated generative ideation with Stable Diffusion1. No prompt: Automated instruction for GPT-42. Generative AI (prompt generation) using ChatGPT with GPT-43. and 4. Generative AI with Stable Diffusion and displaying imagesThe future is yours!The future of development through VR-AIThe groundbreaking shift: Parallelization of development through the fusion of VR and AIOpportunities and risksSummaryQuestionsReferencesFurther reading
Appendix A: Revolutionizing AI: The Power of Optimized Time Complexity in Transformer Models
How constant time complexity O(1) of an operation changed our lives foreverO(1) attention conquers O(n) recurrent methodsAttention layerRecurrent layerThe magic of the computational time complexity of an attention layerComputational time complexity with a CPUComputational time complexity with a GPUComputational time complexity with a TPUTPU-LLMHow one token sparked an AI revolution
Appendix B: Answers to the Questions
Chapter 1, What Are Transformers?Chapter 2, Getting Started with the Architecture of the Transformer ModelChapter 3, Emergent vs Downstream Tasks: The Unseen Depths of TransformersChapter 4, Advancements in Translations with Google Trax, Google Translate, and GeminiChapter 5, Diving into Fine-Tuning through BERTChapter 6, Pretraining a Transformer from Scratch through RoBERTaChapter 7, The Generative AI Revolution with ChatGPTChapter 8, Fine-Tuning OpenAI GPT ModelsChapter 9, Shattering the Black Box with Interpretable ToolsChapter 10, Investigating the Role of Tokenizers in Shaping Transformer ModelsChapter 11, Leveraging LLM Embeddings as an Alternative to Fine-TuningChapter 12, Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4Chapter 13, Summarization with T5 and ChatGPTChapter 14, Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2Chapter 15, Guarding the Giants: Mitigating Risks in Large Language Models Chapter 16, Beyond Text: Vision Transformers in the Dawn of Revolutionary AIChapter 17, Transcending the Image-Text Boundary with Stable DiffusionChapter 18, Hugging Face AutoTrain: Training Vision Models without CodingChapter 19, On the Road to Functional AGI with HuggingGPT and its PeersChapter 20, Beyond Human-Designed Prompts with Generative Ideation
Other Books You May Enjoy
Index

Content preview from Transformers for Natural Language Processing and Computer Vision - Third Edition

16 Beyond Text: Vision Transformers in the Dawn of Revolutionary AI

Up to now, we have examined variations of the Original Transformer model with encoder and decoder layers. We have also explored other models with encoder-only or decoder-only stacks of layers. Also, the size of the layers and parameters has increased. However, the fundamental architecture of the Transformer retains its original structure with identical layers and the parallelization of the computing of the attention heads.

In this chapter, we will explore the innovative transformer models that respect the basic structure of the Original Transformer but make some significant changes. Scores of transformer models will appear, like the many possibilities a box of LEGO^© pieces gives. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Natural Language Processing with Transformers, Revised Edition

Publisher Resources

ISBN: 9781805128724

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Transformers for Natural Language Processing and Computer Vision - Third Edition

by Denis Rothman

16

Beyond Text: Vision Transformers in the Dawn of Revolutionary AI

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.