O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Natural Language Processing with TensorFlow

Book Description

Write modern natural language processing applications using deep learning algorithms and TensorFlow

About This Book
  • Focuses on more efficient natural language processing using TensorFlow
  • Covers NLP as a field in its own right to improve understanding for choosing TensorFlow tools and other deep learning approaches
  • Provides choices for how to process and evaluate large unstructured text datasets
  • Learn to apply the TensorFlow toolbox to specific tasks in the most interesting field in artificial intelligence
Who This Book Is For

This book is for Python developers with a strong interest in deep learning, who want to learn how to leverage TensorFlow to simplify NLP tasks. Fundamental Python skills are assumed, as well as some knowledge of machine learning and undergraduate-level calculus and linear algebra. No previous natural language processing experience required, although some background in NLP or computational linguistics will be helpful.

What You Will Learn
  • Core concepts of NLP and various approaches to natural language processing
  • How to solve NLP tasks by applying TensorFlow functions to create neural networks
  • Strategies to process large amounts of data into word representations that can be used by deep learning applications
  • Techniques for performing sentence classification and language generation using CNNs and RNNs
  • About employing state-of-the art advanced RNNs, like long short-term memory, to solve complex text generation tasks
  • How to write automatic translation programs and implement an actual neural machine translator from scratch
  • The trends and innovations that are paving the future in NLP
In Detail

Natural language processing (NLP) supplies the majority of data available to deep learning applications, while TensorFlow is the most important deep learning framework currently available. Natural Language Processing with TensorFlow brings TensorFlow and NLP together to give you invaluable tools to work with the immense volume of unstructured data in today's data streams, and apply these tools to specific NLP tasks.

Thushan Ganegedara starts by giving you a grounding in NLP and TensorFlow basics. You'll then learn how to use Word2vec, including advanced extensions, to create word embeddings that turn sequences of words into vectors accessible to deep learning algorithms. Chapters on classical deep learning algorithms, like convolutional neural networks (CNN) and recurrent neural networks (RNN), demonstrate important NLP tasks as sentence classification and language generation. You will learn how to apply high-performance RNN models, like long short-term memory (LSTM) cells, to NLP tasks. You will also explore neural machine translation and implement a neural machine translator.

After reading this book, you will gain an understanding of NLP and you'll have the skills to apply TensorFlow in deep learning NLP applications, and how to perform specific NLP tasks.

Style and approach

The book provides an emphasis on both the theory and practice of natural language processing. It introduces the reader to existing TensorFlow functions and explains how to apply them while writing NLP algorithms. The popular Word2vec method is used to teach the essential process of learning word representations. The book focuses on how to apply classical deep learning to NLP, as well as exploring cutting edge and emerging approaches. Specific examples are used to make the concepts and techniques concrete.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Natural Language Processing with TensorFlow
    1. Table of Contents
    2. Natural Language Processing with TensorFlow
      1. Why subscribe?
      2. PacktPub.com
    3. Contributors
      1. About the author
      2. About the reviewers
      3. Packt is searching for authors like you
    4. Preface
      1. Who this book is for
      2. What this book covers
      3. To get the most out of this book
        1. Download the example code files
        2. Download the color images
        3. Conventions used
      4. Get in touch
        1. Reviews
    5. 1. Introduction to Natural Language Processing
      1. What is Natural Language Processing?
      2. Tasks of Natural Language Processing
      3. The traditional approach to Natural Language Processing
        1. Understanding the traditional approach
          1. Example – generating football game summaries
        2. Drawbacks of the traditional approach
      4. The deep learning approach to Natural Language Processing
        1. History of deep learning
        2. The current state of deep learning and NLP
        3. Understanding a simple deep model – a Fully-Connected Neural Network
      5. The roadmap – beyond this chapter
      6. Introduction to the technical tools
        1. Description of the tools
        2. Installing Python and scikit-learn
        3. Installing Jupyter Notebook
        4. Installing TensorFlow
      7. Summary
    6. 2. Understanding TensorFlow
      1. What is TensorFlow?
        1. Getting started with TensorFlow
        2. TensorFlow client in detail
        3. TensorFlow architecture – what happens when you execute the client?
        4. Cafe Le TensorFlow – understanding TensorFlow with an analogy
      2. Inputs, variables, outputs, and operations
        1. Defining inputs in TensorFlow
          1. Feeding data with Python code
          2. Preloading and storing data as tensors
          3. Building an input pipeline
        2. Defining variables in TensorFlow
        3. Defining TensorFlow outputs
        4. Defining TensorFlow operations
          1. Comparison operations
          2. Mathematical operations
          3. Scatter and gather operations
          4. Neural network-related operations
            1. Nonlinear activations used by neural networks
            2. The convolution operation
            3. The pooling operation
            4. Defining loss
            5. Optimization of neural networks
            6. The control flow operations
      3. Reusing variables with scoping
      4. Implementing our first neural network
        1. Preparing the data
        2. Defining the TensorFlow graph
        3. Running the neural network
      5. Summary
    7. 3. Word2vec – Learning Word Embeddings
      1. What is a word representation or meaning?
      2. Classical approaches to learning word representation
        1. WordNet – using an external lexical knowledge base for learning word representations
          1. Tour of WordNet
          2. Problems with WordNet
        2. One-hot encoded representation
        3. The TF-IDF method
        4. Co-occurrence matrix
      3. Word2vec – a neural network-based approach to learning word representation
        1. Exercise: is queen = king – he + she?
        2. Designing a loss function for learning word embeddings
      4. The skip-gram algorithm
        1. From raw text to structured data
        2. Learning the word embeddings with a neural network
          1. Formulating a practical loss function
          2. Efficiently approximating the loss function
            1. Negative sampling of the softmax layer
            2. Hierarchical softmax
            3. Learning the hierarchy
            4. Optimizing the learning model
        3. Implementing skip-gram with TensorFlow
      5. The Continuous Bag-of-Words algorithm
        1. Implementing CBOW in TensorFlow
      6. Summary
    8. 4. Advanced Word2vec
      1. The original skip-gram algorithm
        1. Implementing the original skip-gram algorithm
        2. Comparing the original skip-gram with the improved skip-gram
      2. Comparing skip-gram with CBOW
        1. Performance comparison
        2. Which is the winner, skip-gram or CBOW?
      3. Extensions to the word embeddings algorithms
        1. Using the unigram distribution for negative sampling
        2. Implementing unigram-based negative sampling
        3. Subsampling – probabilistically ignoring the common words
        4. Implementing subsampling
        5. Comparing the CBOW and its extensions
      4. More recent algorithms extending skip-gram and CBOW
        1. A limitation of the skip-gram algorithm
        2. The structured skip-gram algorithm
        3. The loss function
        4. The continuous window model
      5. GloVe – Global Vectors representation
        1. Understanding GloVe
        2. Implementing GloVe
      6. Document classification with Word2vec
        1. Dataset
        2. Classifying documents with word embeddings
        3. Implementation – learning word embeddings
        4. Implementation – word embeddings to document embeddings
        5. Document clustering and t-SNE visualization of embedded documents
        6. Inspecting several outliers
        7. Implementation – clustering/classification of documents with K-means
      7. Summary
    9. 5. Sentence Classification with Convolutional Neural Networks
      1. Introducing Convolution Neural Networks
        1. CNN fundamentals
        2. The power of Convolution Neural Networks
      2. Understanding Convolution Neural Networks
        1. Convolution operation
          1. Standard convolution operation
          2. Convolving with stride
          3. Convolving with padding
          4. Transposed convolution
        2. Pooling operation
          1. Max pooling
          2. Max pooling with stride
          3. Average pooling
        3. Fully connected layers
        4. Putting everything together
      3. Exercise – image classification on MNIST with CNN
        1. About the data
        2. Implementing the CNN
        3. Analyzing the predictions produced with a CNN
      4. Using CNNs for sentence classification
        1. CNN structure
          1. Data transformation
          2. The convolution operation
        2. Pooling over time
        3. Implementation – sentence classification with CNNs
      5. Summary
    10. 6. Recurrent Neural Networks
      1. Understanding Recurrent Neural Networks
        1. The problem with feed-forward neural networks
        2. Modeling with Recurrent Neural Networks
        3. Technical description of a Recurrent Neural Network
      2. Backpropagation Through Time
        1. How backpropagation works
        2. Why we cannot use BP directly for RNNs
        3. Backpropagation Through Time – training RNNs
        4. Truncated BPTT – training RNNs efficiently
        5. Limitations of BPTT – vanishing and exploding gradients
      3. Applications of RNNs
        1. One-to-one RNNs
        2. One-to-many RNNs
        3. Many-to-one RNNs
        4. Many-to-many RNNs
      4. Generating text with RNNs
        1. Defining hyperparameters
        2. Unrolling the inputs over time for Truncated BPTT
        3. Defining the validation dataset
        4. Defining weights and biases
        5. Defining state persisting variables
        6. Calculating the hidden states and outputs with unrolled inputs
        7. Calculating the loss
        8. Resetting state at the beginning of a new segment of text
        9. Calculating validation output
        10. Calculating gradients and optimizing
        11. Outputting a freshly generated chunk of text
      5. Evaluating text results output from the RNN
      6. Perplexity – measuring the quality of the text result
      7. Recurrent Neural Networks with Context Features – RNNs with longer memory
        1. Technical description of the RNN-CF
        2. Implementing the RNN-CF
          1. Defining the RNN-CF hyperparameters
          2. Defining input and output placeholders
          3. Defining weights of the RNN-CF
          4. Variables and operations for maintaining hidden and context states
          5. Calculating output
          6. Calculating the loss
          7. Calculating validation output
          8. Computing test output
          9. Computing the gradients and optimizing
        3. Text generated with the RNN-CF
      8. Summary
    11. 7. Long Short-Term Memory Networks
      1. Understanding Long Short-Term Memory Networks
        1. What is an LSTM?
        2. LSTMs in more detail
        3. How LSTMs differ from standard RNNs
      2. How LSTMs solve the vanishing gradient problem
        1. Improving LSTMs
        2. Greedy sampling
        3. Beam search
        4. Using word vectors
        5. Bidirectional LSTMs (BiLSTM)
      3. Other variants of LSTMs
        1. Peephole connections
        2. Gated Recurrent Units
      4. Summary
    12. 8. Applications of LSTM – Generating Text
      1. Our data
        1. About the dataset
        2. Preprocessing data
      2. Implementing an LSTM
        1. Defining hyperparameters
        2. Defining parameters
        3. Defining an LSTM cell and its operations
        4. Defining inputs and labels
        5. Defining sequential calculations required to process sequential data
        6. Defining the optimizer
        7. Decaying learning rate over time
        8. Making predictions
        9. Calculating perplexity (loss)
        10. Resetting states
        11. Greedy sampling to break unimodality
        12. Generating new text
        13. Example generated text
      3. Comparing LSTMs to LSTMs with peephole connections and GRUs
        1. Standard LSTM
          1. Review
          2. Example generated text
        2. Gated Recurrent Units (GRUs)
          1. Review
          2. The code
          3. Example generated text
        3. LSTMs with peepholes
          1. Review
          2. The code
          3. Example generated text
        4. Training and validation perplexities over time
      4. Improving LSTMs – beam search
        1. Implementing beam search
        2. Examples generated with beam search
      5. Improving LSTMs – generating text with words instead of n-grams
        1. The curse of dimensionality
        2. Word2vec to the rescue
        3. Generating text with Word2vec
        4. Examples generated with LSTM-Word2vec and beam search
        5. Perplexity over time
      6. Using the TensorFlow RNN API
      7. Summary
    13. 9. Applications of LSTM – Image Caption Generation
      1. Getting to know the data
        1. ILSVRC ImageNet dataset
        2. The MS-COCO dataset
      2. The machine learning pipeline for image caption generation
      3. Extracting image features with CNNs
      4. Implementation – loading weights and inferencing with VGG-
        1. Building and updating variables
        2. Preprocessing inputs
        3. Inferring VGG-16
        4. Extracting vectorized representations of images
        5. Predicting class probabilities with VGG-16
      5. Learning word embeddings
      6. Preparing captions for feeding into LSTMs
      7. Generating data for LSTMs
      8. Defining the LSTM
      9. Evaluating the results quantitatively
        1. BLEU
        2. ROUGE
        3. METEOR
        4. CIDEr
        5. BLEU-4 over time for our model
      10. Captions generated for test images
      11. Using TensorFlow RNN API with pretrained GloVe word vectors
        1. Loading GloVe word vectors
        2. Cleaning data
        3. Using pretrained embeddings with TensorFlow RNN API
          1. Defining the pretrained embedding layer and the adaptation layer
          2. Defining the LSTM cell and softmax layer
          3. Defining inputs and outputs
          4. Processing images and text differently
          5. Defining the LSTM output calculation
          6. Defining the logits and predictions
          7. Defining the sequence loss
          8. Defining the optimizer
      12. Summary
    14. 10. Sequence-to-Sequence Learning – Neural Machine Translation
      1. Machine translation
      2. A brief historical tour of machine translation
        1. Rule-based translation
        2. Statistical Machine Translation (SMT)
        3. Neural Machine Translation (NMT)
      3. Understanding Neural Machine Translation
        1. Intuition behind NMT
        2. NMT architecture
          1. The embedding layer
          2. The encoder
          3. The context vector
          4. The decoder
      4. Preparing data for the NMT system
        1. At training time
        2. Reversing the source sentence
        3. At testing time
      5. Training the NMT
      6. Inference with NMT
      7. The BLEU score – evaluating the machine translation systems
        1. Modified precision
        2. Brevity penalty
        3. The final BLEU score
      8. Implementing an NMT from scratch – a German to English translator
        1. Introduction to data
        2. Preprocessing data
        3. Learning word embeddings
        4. Defining the encoder and the decoder
        5. Defining the end-to-end output calculation
        6. Some translation results
      9. Training an NMT jointly with word embeddings
        1. Maximizing matchings between the dataset vocabulary and the pretrained embeddings
        2. Defining the embeddings layer as a TensorFlow variable
      10. Improving NMTs
        1. Teacher forcing
        2. Deep LSTMs
      11. Attention
        1. Breaking the context vector bottleneck
        2. The attention mechanism in detail
          1. Implementing the attention mechanism
          2. Defining weights
          3. Computing attention
        3. Some translation results – NMT with attention
        4. Visualizing attention for source and target sentences
      12. Other applications of Seq2Seq models – chatbots
        1. Training a chatbot
        2. Evaluating chatbots – Turing test
      13. Summary
    15. 11. Current Trends and the Future of Natural Language Processing
      1. Current trends in NLP
        1. Word embeddings
          1. Region embedding
            1. Input representation
            2. Learning region embeddings
            3. Implementation – region embeddings
            4. Classification accuracy
          2. Probabilistic word embedding
          3. Ensemble embedding
          4. Topic embedding
        2. Neural Machine Translation (NMT)
          1. Improving the attention mechanism
          2. Hybrid MT models
      2. Penetration into other research fields
        1. Combining NLP with computer vision
          1. Visual Question Answering (VQA)
          2. Caption generation for images with attention
        2. Reinforcement learning
          1. Teaching agents to communicate using their own language
          2. Dialogue agents with reinforcement learning
        3. Generative Adversarial Networks for NLP
      3. Towards Artificial General Intelligence
        1. One Model to Learn Them All
        2. A joint many-task model – growing a neural network for multiple NLP tasks
          1. First level – word-based tasks
          2. Second level – syntactic tasks
          3. Third level – semantic-level tasks
      4. NLP for social media
        1. Detecting rumors in social media
        2. Detecting emotions in social media
        3. Analyzing political framing in tweets
      5. New tasks emerging
        1. Detecting sarcasm
        2. Language grounding
        3. Skimming text with LSTMs
      6. Newer machine learning models
        1. Phased LSTM
        2. Dilated Recurrent Neural Networks (DRNNs)
      7. Summary
      8. References
    16. A. Mathematical Foundations and Advanced TensorFlow
      1. Basic data structures
        1. Scalar
        2. Vectors
        3. Matrices
        4. Indexing of a matrix
      2. Special types of matrices
        1. Identity matrix
        2. Diagonal matrix
        3. Tensors
      3. Tensor/matrix operations
        1. Transpose
        2. Multiplication
        3. Element-wise multiplication
        4. Inverse
        5. Finding the matrix inverse – Singular Value Decomposition (SVD)
        6. Norms
        7. Determinant
      4. Probability
        1. Random variables
        2. Discrete random variables
        3. Continuous random variables
        4. The probability mass/density function
        5. Conditional probability
        6. Joint probability
        7. Marginal probability
        8. Bayes' rule
      5. Introduction to Keras
      6. Introduction to the TensorFlow seq2seq library
        1. Defining embeddings for the encoder and decoder
        2. Defining the encoder
        3. Defining the decoder
      7. Visualizing word embeddings with TensorBoard
        1. Starting TensorBoard
        2. Saving word embeddings and visualizing via TensorBoard
      8. Summary
    17. Index