Book description
Make NLP easy by building chatbots and models, and executing various NLP tasks to gain data-driven insights from raw text data
Key Features
- Get familiar with key natural language processing (NLP) concepts and terminology
- Explore the functionalities and features of popular NLP tools
- Learn how to use Python programming and third-party libraries to perform NLP tasks
Book Description
Do you want to learn how to communicate with computer systems using Natural Language Processing (NLP) techniques, or make a machine understand human sentiments? Do you want to build applications like Siri, Alexa, or chatbots, even if you've never done it before?
With The Natural Language Processing Workshop, you can expect to make consistent progress as a beginner, and get up to speed in an interactive way, with the help of hands-on activities and fun exercises.
The book starts with an introduction to NLP. You'll study different approaches to NLP tasks, and perform exercises in Python to understand the process of preparing datasets for NLP models. Next, you'll use advanced NLP algorithms and visualization techniques to collect datasets from open websites, and to summarize and generate random text from a document. In the final chapters, you'll use NLP to create a chatbot that detects positive or negative sentiment in text documents such as movie reviews.
By the end of this book, you'll be equipped with the essential NLP tools and techniques you need to solve common business problems that involve processing text.
What you will learn
- Obtain, verify, clean and transform text data into a correct format for use
- Use methods such as tokenization and stemming for text extraction
- Develop a classifier to classify comments in Wikipedia articles
- Collect data from open websites with the help of web scraping
- Train a model to detect topics in a set of documents using topic modeling
- Discover techniques to represent text as word and document vectors
Who this book is for
This book is for beginner to mid-level data scientists, machine learning developers, and NLP enthusiasts. A basic understanding of machine learning and NLP is required to help you grasp the topics in this workshop more quickly.
Table of contents
- The Natural Language Processing Workshop
- Preface
-
1. Introduction to Natural Language Processing
- Introduction
- History of NLP
- Text Analytics and NLP
-
Various Steps in NLP
- Tokenization
- Exercise 1.02: Tokenization of a Simple Sentence
- PoS Tagging
- Exercise 1.03: PoS Tagging
- Stop Word Removal
- Exercise 1.04: Stop Word Removal
- Text Normalization
- Exercise 1.05: Text Normalization
- Spelling Correction
- Exercise 1.06: Spelling Correction of a Word and a Sentence
- Stemming
- Exercise 1.07: Using Stemming
- Lemmatization
- Exercise 1.08: Extracting the Base Word Using Lemmatization
- Named Entity Recognition (NER)
- Exercise 1.09: Treating Named Entities
- Word Sense Disambiguation
- Sentence Boundary Detection
- Kick Starting an NLP Project
- Summary
-
2. Feature Extraction Methods
- Introduction
- Types of Data
-
Cleaning Text Data
- Tokenization
- Exercise 2.01: Text Cleaning and Tokenization
- Exercise 2.02: Extracting n-grams
- Exercise 2.03: Tokenizing Text with Keras and TextBlob
- Types of Tokenizers
- Exercise 2.04: Tokenizing Text Using Various Tokenizers
- Stemming
- RegexpStemmer
- Exercise 2.05: Converting Words in the Present Continuous Tense into Base Words with RegexpStemmer
- The Porter Stemmer
- Exercise 2.06: Using the Porter Stemmer
- Lemmatization
- Exercise 2.07: Performing Lemmatization
- Exercise 2.08: Singularizing and Pluralizing Words
- Language Translation
- Exercise 2.09: Language Translation
- Stop-Word Removal
- Exercise 2.10: Removing Stop Words from Text
- Activity 2.01: Extracting Top Keywords from the News Article
-
Feature Extraction from Texts
- Extracting General Features from Raw Text
- Exercise 2.11: Extracting General Features from Raw Text
- Exercise 2.12: Extracting General Features from Text
- Bag of Words (BoW)
- Exercise 2.13: Creating a Bag of Words
- Zipf's Law
- Exercise 2.14: Zipf's Law
- Term Frequency–Inverse Document Frequency (TFIDF)
- Exercise 2.15: TFIDF Representation
-
Finding Text Similarity – Application of Feature Extraction
- Exercise 2.16: Calculating Text Similarity Using Jaccard and Cosine Similarity
- Word Sense Disambiguation Using the Lesk Algorithm
- Exercise 2.17: Implementing the Lesk Algorithm Using String Similarity and Text Vectorization
- Word Clouds
- Exercise 2.18: Generating Word Clouds
- Other Visualizations
- Exercise 2.19: Other Visualizations Dependency Parse Trees and Named Entities
- Activity 2.02: Text Visualization
- Summary
-
3. Developing a Text Classifier
- Introduction
- Machine Learning
-
Supervised Learning
- Classification
- Logistic Regression
- Exercise 3.03: Text Classification – Logistic Regression
- Naive Bayes Classifiers
- Exercise 3.04: Text Classification – Naive Bayes
- k-nearest Neighbors
- Exercise 3.05: Text Classification Using the k-nearest Neighbors Method
- Regression
- Linear Regression
- Exercise 3.06: Regression Analysis Using Textual Data
- Tree Methods
- Exercise 3.07: Tree-Based Methods – Decision Tree
- Random Forest
- Gradient Boosting Machine and Extreme Gradient Boost
- Exercise 3.08: Tree-Based Methods – Random Forest
- Exercise 3.09: Tree-Based Methods – XGBoost
- Sampling
- Exercise 3.10: Sampling (Simple Random, Stratified, and Multi-Stage)
-
Developing a Text Classifier
- Feature Extraction
- Feature Engineering
- Removing Correlated Features
- Exercise 3.11: Removing Highly Correlated Features (Tokens)
- Dimensionality Reduction
- Exercise 3.12: Performing Dimensionality Reduction Using Principal Component Analysis
- Deciding on a Model Type
- Evaluating the Performance of a Model
- Exercise 3.13: Calculating the RMSE and MAPE of a Dataset
- Activity 3.01: Developing End-to-End Text Classifiers
- Building Pipelines for NLP Projects
- Saving and Loading Models
- Summary
-
4. Collecting Text Data with Web Scraping and APIs
- Introduction
-
Collecting Data by Scraping Web Pages
- Exercise 4.01: Extraction of Tag-Based Information from HTML Files
- Requesting Content from Web Pages
- Exercise 4.02: Collecting Online Text Data
- Exercise 4.03: Analyzing the Content of Jupyter Notebooks (in HTML Format)
- Activity 4.01: Extracting Information from an Online HTML Page
- Activity 4.02: Extracting and Analyzing Data Using Regular Expressions
- Dealing with Semi-Structured Data
- Summary
-
5. Topic Modeling
- Introduction
- Topic Discovery
- Topic-Modeling Algorithms
-
Key Input Parameters for LSA Topic Modeling
- Exercise 5.01: Analyzing Wikipedia World Cup Articles with Latent Semantic Analysis
- Dirichlet Process and Dirichlet Distribution
- Latent Dirichlet Allocation (LDA)
- LDA – How It Works
- Measuring the Predictive Power of a Generative Topic Model
- Exercise 5.02: Finding Topics in Canadian Open Data Inventory Using the LDA Model
- Activity 5.01: Topic-Modeling Jeopardy Questions
- Hierarchical Dirichlet Process (HDP)
- Summary
-
6. Vector Representation
- Introduction
-
What Is a Vector?
- Frequency-Based Embeddings
- Exercise 6.01: Word-Level One-Hot Encoding
- Character-Level One-Hot Encoding
- Exercise 6.02: Character One-Hot Encoding – Manual
- Exercise 6.03: Character-Level One-Hot Encoding with Keras
- Learned Word Embeddings
- Word2Vec
- Exercise 6.04: Training Word Vectors
- Using Pre-Trained Word Vectors
- Exercise 6.05: Using Pre-Trained Word Vectors
- Document Vectors
- Uses of Document Vectors
- Exercise 6.06: Converting News Headlines to Document Vectors
- Activity 6.01: Finding Similar News Article Using Document Vectors
- Summary
- 7. Text Generation and Summarization
- 8. Sentiment Analysis
- Appendix
Product information
- Title: The Natural Language Processing Workshop
- Author(s):
- Release date: August 2020
- Publisher(s): Packt Publishing
- ISBN: 9781800208421
You might also like
book
The Applied AI and Natural Language Processing Workshop
With the help of engaging activities, learn how to leverage Amazon Web Services for building serverless …
book
Getting Started with Natural Language Processing
Hit the ground running with this in-depth introduction to the NLP skills and techniques that allow …
book
Applied Natural Language Processing in the Enterprise
NLP has exploded in popularity over the last few years. But while Google, Facebook, OpenAI, and …
book
Transformers for Natural Language Processing
Publisher's Note: A new edition of this book is out now that includes working with GPT-3 …