Practical Natural Language Processing

Book description

Many books and courses tackle natural language processing (NLP) problems with toy use cases and well-defined datasets. But if you want to build, iterate, and scale NLP systems in a business setting and tailor them for particular industry verticals, this is your guide. Software engineers and data scientists will learn how to navigate the maze of options available at each step of the journey.

Through the course of the book, authors Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana will guide you through the process of building real-world NLP solutions embedded in larger product setups. You’ll learn how to adapt your solutions for different industry verticals such as healthcare, social media, and retail.

With this book, you’ll:

  • Understand the wide spectrum of problem statements, tasks, and solution approaches within NLP
  • Implement and evaluate different NLP applications using machine learning and deep learning methods
  • Fine-tune your NLP solution based on your business problem and industry vertical
  • Evaluate various algorithms and approaches for NLP product tasks, datasets, and stages
  • Produce software solutions following best practices around release, deployment, and DevOps for NLP systems
  • Understand best practices, opportunities, and the roadmap for NLP from a business and product leader’s perspective

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Why We Wrote This Book
    2. The Philosophy
    3. Scope
    4. Who Should Read This Book
    5. What You Will Learn
    6. Structure of the Book
    7. How to Read This Book
      1. Conventions Used in This Book
      2. Using Code Examples
      3. O’Reilly Online Learning
      4. How to Contact Us
      5. Further Information
      6. Acknowledgments
  3. I. Foundations
  4. 1. NLP: A Primer
    1. NLP in the Real World
      1. NLP Tasks
    2. What Is Language?
      1. Building Blocks of Language
      2. Why Is NLP Challenging?
    3. Machine Learning, Deep Learning, and NLP: An Overview
    4. Approaches to NLP
      1. Heuristics-Based NLP
      2. Machine Learning for NLP
      3. Deep Learning for NLP
      4. Why Deep Learning Is Not Yet the Silver Bullet for NLP
    5. An NLP Walkthrough: Conversational Agents
    6. Wrapping Up
  5. 2. NLP Pipeline
    1. Data Acquisition
    2. Text Extraction and Cleanup
      1. HTML Parsing and Cleanup
      2. Unicode Normalization
      3. Spelling Correction
      4. System-Specific Error Correction
    3. Pre-Processing
      1. Preliminaries
      2. Frequent Steps
      3. Other Pre-Processing Steps
      4. Advanced Processing
    4. Feature Engineering
      1. Classical NLP/ML Pipeline
      2. DL Pipeline
    5. Modeling
      1. Start with Simple Heuristics
      2. Building Your Model
      3. Building THE Model
    6. Evaluation
      1. Intrinsic Evaluation
      2. Extrinsic Evaluation
    7. Post-Modeling Phases
      1. Deployment
      2. Monitoring
      3. Model Updating
    8. Working with Other Languages
    9. Case Study
    10. Wrapping Up
  6. 3. Text Representation
    1. Vector Space Models
    2. Basic Vectorization Approaches
      1. One-Hot Encoding
      2. Bag of Words
      3. Bag of N-Grams
      4. TF-IDF
    3. Distributed Representations
      1. Word Embeddings
      2. Going Beyond Words
    4. Distributed Representations Beyond Words and Characters
    5. Universal Text Representations
    6. Visualizing Embeddings
    7. Handcrafted Feature Representations
    8. Wrapping Up
  7. II. Essentials
  8. 4. Text Classification
    1. Applications
    2. A Pipeline for Building Text Classification Systems
      1. A Simple Classifier Without the Text Classification Pipeline
      2. Using Existing Text Classification APIs
    3. One Pipeline, Many Classifiers
      1. Naive Bayes Classifier
      2. Logistic Regression
      3. Support Vector Machine
    4. Using Neural Embeddings in Text Classification
      1. Word Embeddings
      2. Subword Embeddings and fastText
      3. Document Embeddings
    5. Deep Learning for Text Classification
      1. CNNs for Text Classification
      2. LSTMs for Text Classification
      3. Text Classification with Large, Pre-Trained Language Models
    6. Interpreting Text Classification Models
      1. Explaining Classifier Predictions with Lime
    7. Learning with No or Less Data and Adapting to New Domains
      1. No Training Data
      2. Less Training Data: Active Learning and Domain Adaptation
    8. Case Study: Corporate Ticketing
    9. Practical Advice
    10. Wrapping Up
  9. 5. Information Extraction
    1. IE Applications
    2. IE Tasks
    3. The General Pipeline for IE
    4. Keyphrase Extraction
      1. Implementing KPE
      2. Practical Advice
    5. Named Entity Recognition
      1. Building an NER System
      2. NER Using an Existing Library
      3. NER Using Active Learning
      4. Practical Advice
    6. Named Entity Disambiguation and Linking
      1. NEL Using Azure API
    7. Relationship Extraction
      1. Approaches to RE
      2. RE with the Watson API
    8. Other Advanced IE Tasks
      1. Temporal Information Extraction
      2. Event Extraction
      3. Template Filling
    9. Case Study
    10. Wrapping Up
  10. 6. Chatbots
    1. Applications
      1. A Simple FAQ Bot
    2. A Taxonomy of Chatbots
      1. Goal-Oriented Dialog
      2. Chitchats
    3. A Pipeline for Building Dialog Systems
    4. Dialog Systems in Detail
      1. PizzaStop Chatbot
    5. Deep Dive into Components of a Dialog System
      1. Dialog Act Classification
      2. Identifying Slots
      3. Response Generation
      4. Dialog Examples with Code Walkthrough
    6. Other Dialog Pipelines
      1. End-to-End Approach
      2. Deep Reinforcement Learning for Dialogue Generation
      3. Human-in-the-Loop
    7. Rasa NLU
    8. A Case Study: Recipe Recommendations
      1. Utilizing Existing Frameworks
      2. Open-Ended Generative Chatbots
    9. Wrapping Up
  11. 7. Topics in Brief
    1. Search and Information Retrieval
      1. Components of a Search Engine
      2. A Typical Enterprise Search Pipeline
      3. Setting Up a Search Engine: An Example
      4. A Case Study: Book Store Search
    2. Topic Modeling
      1. Training a Topic Model: An Example
      2. What’s Next?
    3. Text Summarization
      1. Summarization Use Cases
      2. Setting Up a Summarizer: An Example
      3. Practical Advice
    4. Recommender Systems for Textual Data
      1. Creating a Book Recommender System: An Example
      2. Practical Advice
    5. Machine Translation
      1. Using a Machine Translation API: An Example
      2. Practical Advice
    6. Question-Answering Systems
      1. Developing a Custom Question-Answering System
      2. Looking for Deeper Answers
    7. Wrapping Up
  12. III. Applied
  13. 8. Social Media
    1. Applications
    2. Unique Challenges
    3. NLP for Social Data
      1. Word Cloud
      2. Tokenizer for SMTD
      3. Trending Topics
      4. Understanding Twitter Sentiment
      5. Pre-Processing SMTD
      6. Text Representation for SMTD
      7. Customer Support on Social Channels
    4. Memes and Fake News
      1. Identifying Memes
      2. Fake News
    5. Wrapping Up
  14. 9. E-Commerce and Retail
    1. E-Commerce Catalog
      1. Review Analysis
      2. Product Search
      3. Product Recommendations
    2. Search in E-Commerce
    3. Building an E-Commerce Catalog
      1. Attribute Extraction
      2. Product Categorization and Taxonomy
      3. Product Enrichment
      4. Product Deduplication and Matching
    4. Review Analysis
      1. Sentiment Analysis
      2. Aspect-Level Sentiment Analysis
      3. Connecting Overall Ratings to Aspects
      4. Understanding Aspects
    5. Recommendations for E-Commerce
      1. A Case Study: Substitutes and Complements
    6. Wrapping Up
  15. 10. Healthcare, Finance, and Law
    1. Healthcare
      1. Health and Medical Records
      2. Patient Prioritization and Billing
      3. Pharmacovigilance
      4. Clinical Decision Support Systems
      5. Health Assistants
      6. Electronic Health Records
      7. Mental Healthcare Monitoring
      8. Medical Information Extraction and Analysis
    2. Finance and Law
      1. NLP Applications in Finance
      2. NLP and the Legal Landscape
    3. Wrapping Up
  16. IV. Bringing It All Together
  17. 11. The End-to-End NLP Process
    1. Revisiting the NLP Pipeline: Deploying NLP Software
      1. An Example Scenario
    2. Building and Maintaining a Mature System
      1. Finding Better Features
      2. Iterating Existing Models
      3. Code and Model Reproducibility
      4. Troubleshooting and Interpretability
      5. Monitoring
      6. Minimizing Technical Debt
      7. Automating Machine Learning
    3. The Data Science Process
      1. The KDD Process
      2. Microsoft Team Data Science Process
    4. Making AI Succeed at Your Organization
      1. Team
      2. Right Problem and Right Expectations
      3. Data and Timing
      4. A Good Process
      5. Other Aspects
    5. Peeking over the Horizon
    6. Final Words
  18. Index

Product information

  • Title: Practical Natural Language Processing
  • Author(s): Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, Harshit Surana
  • Release date: June 2020
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492054054