Mastering spaCy

Book description

Build end-to-end industrial-strength NLP models using advanced morphological and syntactic features in spaCy to create real-world applications with ease

Key Features

  • Gain an overview of what spaCy offers for natural language processing
  • Learn details of spaCy's features and how to use them effectively
  • Work through practical recipes using spaCy

Book Description

spaCy is an industrial-grade, efficient NLP Python library. It offers various pre-trained models and ready-to-use features. Mastering spaCy provides you with end-to-end coverage of spaCy's features and real-world applications.

You'll begin by installing spaCy and downloading models, before progressing to spaCy's features and prototyping real-world NLP apps. Next, you'll get familiar with visualizing with spaCy's popular visualizer displaCy. The book also equips you with practical illustrations for pattern matching and helps you advance into the world of semantics with word vectors. Statistical information extraction methods are also explained in detail. Later, you'll cover an interactive business case study that shows you how to combine all spaCy features for creating a real-world NLP pipeline. You'll implement ML models such as sentiment analysis, intent recognition, and context resolution. The book further focuses on classification with popular frameworks such as TensorFlow's Keras API together with spaCy. You'll cover popular topics, including intent classification and sentiment analysis, and use them on popular datasets and interpret the classification results.

By the end of this book, you'll be able to confidently use spaCy, including its linguistic features, word vectors, and classifiers, to create your own NLP apps.

What you will learn

  • Install spaCy, get started easily, and write your first Python script
  • Understand core linguistic operations of spaCy
  • Discover how to combine rule-based components with spaCy statistical models
  • Become well-versed with named entity and keyword extraction
  • Build your own ML pipelines using spaCy
  • Apply all the knowledge you've gained to design a chatbot using spaCy

Who this book is for

This book is for data scientists and machine learners who want to excel in NLP as well as NLP developers who want to master spaCy and build applications with it. Language and speech professionals who want to get hands-on with Python and spaCy and software developers who want to quickly prototype applications with spaCy will also find this book helpful. Beginner-level knowledge of the Python programming language is required to get the most out of this book. A beginner-level understanding of linguistics such as parsing, POS tags, and semantic similarity will also be useful.

Table of contents

  1. Mastering spaCy
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  6. Section 1: Getting Started with spaCy
  7. Chapter 1: Getting Started with spaCy
    1. Technical requirements
    2. Overview of spaCy
      1. Rise of NLP
      2. NLP with Python
      3. Reviewing some useful string operations
      4. Getting a high-level overview of the spaCy library
      5. Tips for the reader
    3. Installing spaCy
      1. Installing spaCy with pip
      2. Installing spaCy with conda
      3. Installing spaCy on macOS/OS X
      4. Installing spaCy on Windows
      5. Troubleshooting while installing spaCy
    4. Installing spaCy's statistical models
      1. Installing language models
    5. Visualization with displaCy
      1. Getting started with displaCy
      2. Entity visualizer
      3. Visualizing within Python
      4. Using displaCy in Jupyter notebooks
      5. Exporting displaCy graphics as an image file
    6. Summary
  8. Chapter 2: Core Operations with spaCy
    1. Technical requirements
    2. Overview of spaCy conventions
    3. Introducing tokenization
      1. Customizing the tokenizer
      2. Debugging the tokenizer
      3. Sentence segmentation
    4. Understanding lemmatization
      1. Lemmatization in NLU
      2. Understanding the difference between lemmatization and stemming
    5. spaCy container objects
      1. Doc
      2. Token
      3. Span
    6. More spaCy features
    7. Summary
  9. Section 2: spaCy Features
  10. Chapter 3: Linguistic Features
    1. Technical requirements
    2. What is POS tagging?
      1. WSD
      2. Verb tense and aspect in NLU applications
      3. Understanding number, symbol, and punctuation tags
    3. Introduction to dependency parsing
      1. What is dependency parsing?
      2. Dependency relations
      3. Syntactic relations
    4. Introducing NER
      1. A real-world example
    5. Merging and splitting tokens
    6. Summary
  11. Chapter 4: Rule-Based Matching
    1. Token-based matching
      1. Extended syntax support
      2. Regex-like operators
      3. Regex support
      4. Matcher online demo
    2. PhraseMatcher
    3. EntityRuler
    4. Combining spaCy models and matchers
      1. Extracting IBAN and account numbers
      2. Extracting phone numbers
      3. Extracting mentions
      4. Hashtag and emoji extraction
      5. Expanding named entities
      6. Combining linguistic features and named entities
    5. Summary
  12. Chapter 5: Working with Word Vectors and Semantic Similarity
    1. Technical requirements
    2. Understanding word vectors
      1. One-hot encoding
      2. Word vectors
      3. Analogies and vector operations
      4. How word vectors are produced
    3. Using spaCy's pretrained vectors
      1. The similarity method
    4. Using third-party word vectors
    5. Advanced semantic similarity methods
      1. Understanding semantic similarity
      2. Categorizing text with semantic similarity
      3. Extracting key phrases
      4. Extracting and comparing named entities
    6. Summary
  13. Chapter 6: Putting Everything Together: Semantic Parsing with spaCy
    1. Technical requirements
    2. Extracting named entities
      1. Getting to know the ATIS dataset
      2. Extracting named entities with Matcher
      3. Using dependency trees for extracting entities
    3. Using dependency relations for intent recognition
      1. Linguistic primer
      2. Extracting transitive verbs and their direct objects
      3. Extracting multiple intents with conjunction relation
      4. Recognizing the intent using wordlists
    4. Semantic similarity methods for semantic parsing
      1. Using synonyms lists for semantic similarity
      2. Using word vectors to recognize semantic similarity
    5. Putting it all together
    6. Summary
  14. Section 3: Machine Learning with spaCy
  15. Chapter 7: Customizing spaCy Models
    1. Technical requirements
    2. Getting started with data preparation
      1. Do spaCy models perform well enough on your data?
      2. Does your domain include many labels that are absent in spaCy models?
    3. Annotating and preparing data
      1. Annotating data with Prodigy
      2. Annotating data with Brat
      3. spaCy training data format
    4. Updating an existing pipeline component
      1. Disabling the other statistical models
      2. Model training procedure
      3. Evaluating the updated NER
      4. Saving and loading custom models
    5. Training a pipeline component from scratch
      1. Working with a real-world dataset
    6. Summary
  16. Chapter 8: Text Classification with spaCy
    1. Technical requirements
    2. Understanding the basics of text classification
    3. Training the spaCy text classifier
      1. Getting to know TextCategorizer class
      2. Formatting training data for the TextCategorizer
      3. Defining the training loop
      4. Testing the new component
      5. Training TextCategorizer for multilabel classification
    4. Sentiment analysis with spaCy
      1. Exploring the dataset
      2. Training the TextClassifier component
    5. Text classification with spaCy and Keras
      1. What is a layer?
      2. Sequential modeling with LSTMs
      3. Keras Tokenizer
      4. Embedding words
      5. Neural network architecture for text classification
    6. Summary
    7. References
  17. Chapter 9: spaCy and Transformers
    1. Technical requirements
    2. Transformers and transfer learning
    3. Understanding BERT
      1. BERT architecture
      2. BERT input format
      3. How is BERT trained?
    4. Transformers and TensorFlow
      1. HuggingFace Transformers
      2. Using the BERT tokenizer
      3. Obtaining BERT word vectors
      4. Using BERT for text classification
      5. Using Transformer pipelines
    5. Transformers and spaCy
    6. Summary
  18. Chapter 10: Putting Everything Together: Designing Your Chatbot with spaCy
    1. Technical requirements
    2. Introduction to conversational AI
      1. NLP components of conversational AI products
      2. Getting to know the dataset
    3. Entity extraction
      1. Extracting city entities
      2. Extracting date and time entities
      3. Extracting phone numbers
      4. Extracting cuisine types
    4. Intent recognition
      1. Pattern-based text classification
      2. Classifying text with a character-level LSTM
      3. Differentiating subjects from objects
      4. Parsing the sentence type
      5. Anaphora resolution
    5. Summary
    6. References
    7. Why subscribe?
  19. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Leave a review - let other readers know what you think

Product information

  • Title: Mastering spaCy
  • Author(s): Duygu Altinok
  • Release date: July 2021
  • Publisher(s): Packt Publishing
  • ISBN: 9781800563353