Applied Natural Language Processing in the Enterprise

Book description

NLP is one of the hottest topics in AI today. Having lagged for years behind other deep learning fields such as computer vision, NLP only recently gained mainstream popularity. Google, Facebook, and OpenAI have open-sourced large pretrained language models, but many organizations today still struggle with building and adopting NLP applications. This hands-on guide helps you learn the process quickly.

If you have a basic to intermediate understanding of machine learning and programming experience with Python, you’ll learn how to build and deploy real-world NLP applications in your organization. Authors Ankur Patel and Ajay Uppili Arasanipalai walk you through the process without bogging you down in theory.

  • Understand how state-of-the-art NLP models work
  • Learn the tools of the trade, including frameworks popular today
  • Perform NLP tasks such as text classification, semantic search, and reading comprehension
  • Solve problems using new models like transformers and techniques such as transfer learning
  • Build NLP models from scratch with performance comparable or superior to out-of-the-box systems
  • Deploy your models to production and maintain their performance
  • Implement a suite of NLP algorithms using Python and PyTorch

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Natural Language Processing
    2. Why We Wrote This Book
    3. Who This Book Is For
    4. Objective and Approach
    5. Prerequisites
    6. How This Book Is Organized
    7. Conventions Used in This Book
    8. Using Code Examples
    9. O’Reilly Online Learning
    10. How to Contact Us
    11. Acknowledgments
  2. 1. Introduction to NLP
    1. What is NLP?
      1. Popular Applications
      2. History
      3. Inflection Points
      4. A Final Word
    2. Basic NLP
      1. Define NLP Tasks
      2. Setup Programming Environment
      3. Perform NLP Tasks using SpaCy
    3. Conclusion
  3. 2. Tools of the Trade
    1. Deep Learning Frameworks
      1. PyTorch
      2. TensorFlow
      3. Jax
      4. Julia
      5. Swift for TensorFlow
      6. Our Pick
    2. Visualization and Experiment Tracking
      1. TensorBoard
      2. Weights & Biases
      3. Comet
      4. Neptune
      5. Guild
      6. Our Pick
    3. AutoML
      1. H2O.ai
      2. Dataiku
      3. DataRobot
      4. Our Pick
    4. ML Infrastructure and Compute
      1. PaperSpace
      2. FloydHub
      3. Google Colab
      4. Kaggle Kernels
      5. Lambda GPU Cloud
      6. Our Pick
    5. Edge / On-Device Inference
      1. ONNX
      2. Core ML
      3. Edge Accelerators
      4. Our Pick
    6. Cloud Inference & Machine Learning as a Service (MLaaS)
      1. AWS
      2. Microsoft Azure
      3. Google Cloud Platform
      4. Our Pick
    7. CI/CD
      1. Our Pick
    8. Conclusion
  4. 3. NLP Applications
    1. Pretrained Models
    2. Transfer Learning
    3. Natural Language Dataset
      1. Explore AG Dataset
    4. Named Entity Recognition
      1. Perform Inference using Base SpaCy Model
      2. Custom NER
      3. Annotate via Prodigy - NER
      4. Train Custom NER Model using SpaCy
      5. Custom NER Model vs. Base NER Model
    5. Text Classification
      1. Annotate via Prodigy - Text Classification
      2. Train Text Classification Models using SpaCy
    6. Conclusion
  5. 4. Transformers and Transfer Learning
    1. Transfer learning With fastai
      1. Using the high-level fastai API
      2. ULMFiT for Transfer Learning
      3. Fine-tuning a language model on IMDb
      4. Training a text classifier
    2. Inference with Huggingface
      1. Loading Models
      2. Generating Predictions
    3. Conclusion
  6. 5. Preprocessing and Tokenization
    1. Huggingface Tokenizers
    2. Subtoken Tokenization
    3. Building Your Own Tokenizer
    4. How to Build a Fast Tokenizer
    5. Conclusion
  7. 6. Embeddings: How Machines “Understand” Words
    1. Understand Versus Reading Text
    2. Word Vectors
      1. Word2Vec
      2. Embeddings in the Age of Transfer Learning
    3. Notebook
      1. Preprocessing
      2. Model
      3. Training
      4. Validation
      5. Embedding Things That Aren’t Words
      6. Making Vectorized Music
    4. Conclusion
  8. 7. Recurrent Neural Networks and Other Sequences
    1. Recurrent Neural Networks
      1. Bidirectional RNN
      2. Sequence to Sequence Using RNNs
    2. Long Short Term Memory (LSTM)
    3. Gated Recurrent Units (GRU)
    4. The Future of RNNs
  9. 8. Productionizing NLP Models
    1. Data Scientists, Engineers, and Analysts
      1. Prototyping, Deployment, and Maintenance
      2. Notebooks and Scripts
    2. Unified Data Analytics Platform
      1. Support for Big Data
      2. Support For Multiple Programming Languages
      3. Support for ML Frameworks
      4. Support for Model Repository, Access Control, Data Lineage, and Versioning
    3. Databricks Setup
      1. Set Up Account
      2. Set Up Access to S3 Bucket
      3. Set Up Libraries
      4. Create Cluster
      5. Create Notebook
      6. Enable Init Script and Restart Cluster
      7. Run Speed Test - Inference on NER using SpaCy
    4. Machine Learning Jobs
      1. Production Pipeline Notebook
      2. Scheduled Machine Learning Jobs
      3. Event-Driven Machine Learning Pipeline
    5. MLflow
      1. Log and Register Model
      2. MLflow Model Serving
    6. Machine Learning in a Web App
      1. Build Streamlit App
      2. Deploy Streamlit App
      3. Explore Streamlit Web App
      4. Build and Deploy Streamlit App for Custom NER
      5. Build and Deploy Streamlit App for Text Classification on AGNews Dataset
      6. Build and Deploy Streamlit App for Text Classification on Custom Text
    7. Conclusion

Product information

  • Title: Applied Natural Language Processing in the Enterprise
  • Author(s): Ankur A. Patel, Ajay Uppili Arasanipalai
  • Release date: June 2021
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492062578