Hands-On Python Natural Language Processing

Book description

Get well-versed with traditional as well as modern natural language processing concepts and techniques

Key Features

  • Perform various NLP tasks to build linguistic applications using Python libraries
  • Understand, analyze, and generate text to provide accurate results
  • Interpret human language using various NLP concepts, methodologies, and tools

Book Description

Natural Language Processing (NLP) is the subfield in computational linguistics that enables computers to understand, process, and analyze text. This book caters to the unmet demand for hands-on training of NLP concepts and provides exposure to real-world applications along with a solid theoretical grounding.

This book starts by introducing you to the field of NLP and its applications, along with the modern Python libraries that you'll use to build your NLP-powered apps. With the help of practical examples, you'll learn how to build reasonably sophisticated NLP applications, and cover various methodologies and challenges in deploying NLP applications in the real world. You'll cover key NLP tasks such as text classification, semantic embedding, sentiment analysis, machine translation, and developing a chatbot using machine learning and deep learning techniques. The book will also help you discover how machine learning techniques play a vital role in making your linguistic apps smart. Every chapter is accompanied by examples of real-world applications to help you build impressive NLP applications of your own.

By the end of this NLP book, you'll be able to work with language data, use machine learning to identify patterns in text, and get acquainted with the advancements in NLP.

What you will learn

  • Understand how NLP powers modern applications
  • Explore key NLP techniques to build your natural language vocabulary
  • Transform text data into mathematical data structures and learn how to improve text mining models
  • Discover how various neural network architectures work with natural language data
  • Get the hang of building sophisticated text processing models using machine learning and deep learning
  • Check out state-of-the-art architectures that have revolutionized research in the NLP domain

Who this book is for

This NLP Python book is for anyone looking to learn NLP's theoretical and practical aspects alike. It starts with the basics and gradually covers advanced concepts to make it easy to follow for readers with varying levels of NLP proficiency. This comprehensive guide will help you develop a thorough understanding of the NLP methodologies for building linguistic applications; however, working knowledge of Python programming language and high school level mathematics is expected.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Python Natural Language Processing
  3. About Packt
    1. Why subscribe?
  4. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Section 1: Introduction
  7. Understanding the Basics of NLP
    1. Programming languages versus natural languages
      1. Understanding NLP
    2. Why should I learn NLP?
    3. Current applications of NLP
      1. Chatbots
      2. Sentiment analysis
      3. Machine translation
      4. Named-entity recognition
      5. Future applications of NLP
    4. Summary
  8. NLP Using Python
    1. Technical requirements
    2. Understanding Python with NLP
      1. Python's utility in NLP
    3. Important Python libraries
      1. NLTK
        1. NLTK corpora
          1. Text processing
          2. Part of speech tagging
      2. Textblob
        1. Sentiment analysis
        2. Machine translation
        3. Part of speech tagging
      3. VADER
    4. Web scraping libraries and methodology
    5. Overview of Jupyter Notebook
    6. Summary
  9. Section 2: Natural Language Representation and Mathematics
  10. Building Your NLP Vocabulary
    1. Technical requirements
    2. Lexicons
    3. Phonemes, graphemes, and morphemes
    4. Tokenization
      1. Issues with tokenization
      2. Different types of tokenizers
        1. Regular expressions
        2. Regular expressions-based tokenizers
        3. Treebank tokenizer
        4. TweetTokenizer
    5. Understanding word normalization
      1. Stemming
        1. Over-stemming and under-stemming
      2. Lemmatization
        1. WordNet lemmatizer
        2. Spacy lemmatizer
      3. Stopword removal
      4. Case folding
      5. N-grams
      6. Taking care of HTML tags
      7. How does all this fit into my NLP pipeline?
    6. Summary
  11. Transforming Text into Data Structures
    1. Technical requirements
    2. Understanding vectors and matrices
      1. Vectors
      2. Matrices
    3. Exploring the Bag-of-Words architecture
      1. Understanding a basic CountVectorizer
      2. Out-of-the-box features offered by CountVectorizer
        1. Prebuilt dictionary and support for n-grams
        2. max_features
        3. Min_df and Max_df thresholds
      3. Limitations of the BoW representation
    4. TF-IDF vectors
      1. Building a basic TF-IDF vectorizer
      2. N-grams and maximum features in the TF-IDF vectorizer
      3. Limitations of the TF-IDF vectorizer's representation
    5. Distance/similarity calculation between document vectors
      1. Cosine similarity
        1. Solving Cosine math
        2. Cosine similarity on vectors developed using CountVectorizer
        3. Cosine similarity on vectors developed using TfIdfVectorizers tool
    6. One-hot vectorization
    7. Building a basic chatbot
    8. Summary
  12. Word Embeddings and Distance Measurements for Text
    1. Technical requirements
    2. Understanding word embeddings
    3. Demystifying Word2vec
      1. Supervised and unsupervised learning
      2. Word2vec – supervised or unsupervised?
      3. Pretrained Word2vec
      4. Exploring the pretrained Word2vec model using gensim
      5. The Word2vec architecture
        1. The Skip-gram method
          1. How do you define target and context words?
        2. Exploring the components of a Skip-gram model
          1. Input vector
          2. Embedding matrix
          3. Context matrix
          4. Output vector
          5. Softmax
          6. Loss calculation and backpropagation
          7. Inference
        3. The CBOW method
        4. Computational limitations of the methods discussed and how to overcome them
          1. Subsampling
          2. Negative sampling
          3. How to select negative samples
    4. Training a Word2vec model
      1. Building a basic Word2vec model
      2. Modifying the min_count parameter
      3. Playing with the vector size
      4. Other important configurable parameters
      5. Limitations of Word2vec
      6. Applications of the Word2vec model
    5. Word mover’s distance
    6. Summary
  13. Exploring Sentence-, Document-, and Character-Level Embeddings
    1. Technical requirements
    2. Venturing into Doc2Vec
      1. Building a Doc2Vec model
        1. Changing vector size and min_count
        2. The dm parameter for switching between modeling approaches
        3. The dm_concat parameter
        4. The dm_mean parameter
        5. Window size
        6. Learning rate
    3. Exploring fastText
      1. Building a fastText model
      2. Building a spelling corrector/word suggestion module using fastText
      3. fastText and document distances
    4. Understanding Sent2Vec and the Universal Sentence Encoder</span>
      1. Sent2Vec
      2. The Universal Sentence Encoder
    5. Summary
  14. Section 3: NLP and Learning
  15. Identifying Patterns in Text Using Machine Learning
    1. Technical requirements
    2. Introduction to ML
    3. Data preprocessing
      1. NaN values
      2. Label encoding and one-hot encoding
      3. Data standardization
        1. Min-max standardization
        2. Z-score standardization
    4. The Naive Bayes algorithm
      1. Building a sentiment analyzer using the Naive Bayes algorithm
    5. The SVM algorithm
      1. Building a sentiment analyzer using SVM
    6. Productionizing a trained sentiment analyzer
    7. Summary
  16. From Human Neurons to Artificial Neurons for Understanding Text
    1. Technical requirements
    2. Exploring the biology behind neural networks
      1. Neurons
      2. Activation functions
        1. Sigmoid
        2. Tanh activation
        3. Rectified linear unit
      3. Layers in an ANN
    3. How does a neural network learn?
      1. How does the network get better at making predictions?
    4. Understanding regularization
      1. Dropout
    5. Let's talk Keras
    6. Building a question classifier using neural networks
    7. Summary
  17. Applying Convolutions to Text
    1. Technical requirements
    2. What is a CNN?
      1. Understanding convolutions
        1. Let's pad our data
        2. Understanding strides in a CNN
      2. What is pooling?
      3. The fully connected layer
    3. Detecting sarcasm in text using CNNs
      1. Loading the libraries and the dataset
      2. Performing basic data analysis and preprocessing our data
      3. Loading the Word2Vec model and vectorizing our data
      4. Splitting our dataset into train and test sets
      5. Building the model
      6. Evaluating and saving our model
    4. Summary
  18. Capturing Temporal Relationships in Text
    1. Technical requirements
    2. Baby steps toward understanding RNNs
      1. Forward propagation in an RNN
      2. Backpropagation through time in an RNN
    3. Vanishing and exploding gradients
    4. Architectural forms of RNNs
      1. Different flavors of RNN
      2. Carrying relationships both ways using bidirectional RNNs
      3. Going deep with RNNs
    5. Giving memory to our networks – LSTMs
      1. Understanding an LSTM cell
        1. Forget gate
        2. Input gate
        3. Output gate
      2. Backpropagation through time in LSTMs
    6. Building a text generator using LSTMs
    7. Exploring memory-based variants of the RNN architecture
      1. GRUs
      2. Stacked LSTMs
    8. Summary
  19. State of the Art in NLP
    1. Technical requirements
    2. Seq2Seq modeling
      1. Encoders
      2. Decoders
        1. The training phase
        2. The inference phase
    3. Translating between languages using Seq2Seq modeling
    4. Let's pay some attention
    5. Transformers
      1. Understanding the architecture of Transformers
        1. Encoders
        2. Decoders
        3. Self-attention
          1. How does self-attention work mathematically?
          2. A small note on masked self-attention
        4. Feedforward neural networks
        5. Residuals and layer normalization
        6. Positional embeddings
        7. How the decoder works
        8. The linear layer and the softmax function
        9. Transformer model summary
    6. BERT
      1. The BERT architecture
      2. The BERT model input and output
      3. How did BERT the pre-training happen?
        1. The masked language model
        2. Next-sentence prediction
      4. BERT fine-tuning
    7. Summary
  20. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Python Natural Language Processing
  • Author(s): Aman Kedia, Mayank Rasu
  • Release date: June 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781838989590