Hands-on NLP with NLTK and Scikit-learn

Video description

A complete Python guide to Natural Language Processing to build spam filters, topic classifiers, and sentiment analyzers

About This Video

  • Build actual solutions backed by machine learning and Natural Language Processing models, instead of meandering in theory and mathematical symbols.
  • Single-handedly build three models, one for spam filtering, 0ne for sentiment analysis, and finally one for text classification.
  • Get the right foundation from which to do applied, actual Natural Language Processing. We show you how to get open sourced data, wrangle text into Python data structures with NLTK, and predict different classes of natural language with scikit-learn.

In Detail

There is an overflow of text data online nowadays. As a Python developer, you need to create a new solution using Natural Language Processing for your next project. Your colleagues depend on you to monetize gigabytes of unstructured text data. What do you do?

Hands-on NLP with NLTK and scikit-learn is the answer. This course puts you right on the spot, starting off with building a spam classifier in our first video. At the end of the course, you are going to walk away with three NLP applications: a spam filter, a topic classifier, and a sentiment analyzer. There is no need for fancy mathematical theory, just plain English explanations of core NLP concepts and how to apply those using Python libraries.

Taking this course will help you to precisely create new applications with Python and NLP. You will be able to build actual solutions backed by machine learning and NLP processing models with ease.

This course uses Python 3.6, TensorFlow 1.4, NLTK 2, and scikit-learn 0.19, while not the latest version available, it provides relevant and informative content for legacy users of NLP with NLTK and Scikit-learn.

Audience

This course is for developers, data scientists, and programmers who want to learn about practical Natural Language Processing with Python in a hands-on way. Developers who have an upcoming project that needs NLP, or a pile of unstructured text data on their hands, and don't know what to do with it, will find this course useful. Prior programming experience with Python is assumed along with being comfortable dealing with machine learning terms such as supervised learning, regression, and classification. No prior Natural Language Processing or text mining experience is needed.

Publisher resources

Download Example Code

Table of contents

  1. Chapter 1 : Working with Natural Language Data
    1. The Course Overview
    2. Use Python, NLTK, spaCy, and Scikit-learn to Build Your NLP Toolset
    3. Reading a Simple Natural Language File into Memory
    4. Split the Text into Individual Words with Regular Expression
    5. Converting Words into Lists of Lower Case Tokens
    6. Removing Uncommon Words and Stop Words
  2. Chapter 2 : Spam Classification with an Email Dataset
    1. Use an Open Source Dataset, and What Is the Enron Dataset
    2. Loading the Enron Dataset into Memory
    3. Tokenization, Lemmatization, and Stop Word Removal
    4. Bag-of-Words Feature Extraction Process with Scikit-learn
    5. Basic Spam Classification with NLTK's Naive Bayes
  3. Chapter 3 : Sentiment Analysis with a Movie Review Dataset
    1. Understanding the Origin and Features of the Movie Review Dataset
    2. Loading and Cleaning the Review Data
    3. Preprocessing the Dataset to Remove Unwanted Words and Characters
    4. Creating TF-IDF Weighted Natural Language Features
    5. Basic Sentiment Analysis with Logistic Regression Model
  4. Chapter 4 : Boosting the Performance of Your Models with N-grams
    1. Deep Dive into Raw Tokens from the Movie Reviews
    2. Advanced Cleaning of Tokens Using Python String Functions and Regex
    3. Creating N-gram Features Using Scikit-learn
    4. Experimenting with Advanced Scikit-learn Models Using the NLTK Wrapper
    5. Building a Voting Model with Scikit-learn
  5. Chapter 5 : Document Classification with a Newsgroup Dataset
    1. Understanding the Origin and Features of the 20 Newsgroups Dataset
    2. Loading the Newsgroup Data and Extracting Features
    3. Building a Document Classification Pipeline
    4. Creating a Performance Report of the Model on the Test Set
    5. Finding Optimal Hyper-parameters Using Grid Search
  6. Chapter 6 : Advanced Topic Modelling with TF-IDF, LSA, and SVMs
    1. Building a Text Preprocessing Pipeline with NLTK
    2. Creating Hashing Based Features from Natural Language
    3. Classify Documents into 20 Topics with LSA
    4. Document Classification with TF-IDF and SVMs

Product information

  • Title: Hands-on NLP with NLTK and Scikit-learn
  • Author(s): Colibri Ltd
  • Release date: July 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789345612