Book description
Over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0
In Detail
This book will show you the essential techniques of text and language processing. Starting with tokenization, stemming, and the WordNet dictionary, you'll progress to part-of-speech tagging, phrase chunking, and named entity recognition. You'll learn how various text corpora are organized, as well as how to create your own custom corpus. Then, you'll move onto text classification with a focus on sentiment analysis. And because NLP can be computationally expensive on large bodies of text, you'll try a few methods for distributed text processing. Finally, you'll be introduced to a number of other small but complementary Python libraries for text analysis, cleaning, and parsing.
This cookbook provides simple, straightforward examples so you can quickly learn text processing with Python and NLTK.
What You Will Learn
- Tokenize text into sentences, and sentences into words
- Look up words in the WordNet dictionary
- Apply spelling correction and word replacement
- Access the built-in text corpora and create your own custom corpus
- Tag words with parts of speech
- Chunk phrases and recognize named entities
- Grammatically transform phrases and chunks
- Classify text and perform sentiment analysis
Table of contents
-
Python 3 Text Processing with NLTK 3 Cookbook
- Table of Contents
- Python 3 Text Processing with NLTK 3 Cookbook
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Preface
-
1. Tokenizing Text and WordNet Basics
- Introduction
- Tokenizing text into sentences
- Tokenizing sentences into words
- Tokenizing sentences using regular expressions
- Training a sentence tokenizer
- Filtering stopwords in a tokenized sentence
- Looking up Synsets for a word in WordNet
- Looking up lemmas and synonyms in WordNet
- Calculating WordNet Synset similarity
- Discovering word collocations
- 2. Replacing and Correcting Words
-
3. Creating Custom Corpora
- Introduction
- Setting up a custom corpus
- Creating a wordlist corpus
- Creating a part-of-speech tagged word corpus
- Creating a chunked phrase corpus
- Creating a categorized text corpus
- Creating a categorized chunk corpus reader
- Lazy corpus loading
- Creating a custom corpus view
- Creating a MongoDB-backed corpus reader
- Corpus editing with file locking
-
4. Part-of-speech Tagging
- Introduction
- Default tagging
- Training a unigram part-of-speech tagger
- Combining taggers with backoff tagging
- Training and combining ngram taggers
- Creating a model of likely word tags
- Tagging with regular expressions
- Affix tagging
- Training a Brill tagger
- Training the TnT tagger
- Using WordNet for tagging
- Tagging proper names
- Classifier-based tagging
- Training a tagger with NLTK-Trainer
-
5. Extracting Chunks
- Introduction
- Chunking and chinking with regular expressions
- Merging and splitting chunks with regular expressions
- Expanding and removing chunks with regular expressions
- Partial parsing with regular expressions
- Training a tagger-based chunker
- Classification-based chunking
- Extracting named entities
- Extracting proper noun chunks
- Extracting location chunks
- Training a named entity chunker
- Training a chunker with NLTK-Trainer
-
6. Transforming Chunks and Trees
- Introduction
- Filtering insignificant words from a sentence
- Correcting verb forms
- Swapping verb phrases
- Swapping noun cardinals
- Swapping infinitive phrases
- Singularizing plural nouns
- Chaining chunk transformations
- Converting a chunk tree to text
- Flattening a deep tree
- Creating a shallow tree
- Converting tree labels
-
7. Text Classification
- Introduction
- Bag of words feature extraction
- Training a Naive Bayes classifier
- Training a decision tree classifier
- Training a maximum entropy classifier
- Training scikit-learn classifiers
- Measuring precision and recall of a classifier
- Calculating high information words
- Combining classifiers with voting
- Classifying with multiple binary classifiers
- Training a classifier with NLTK-Trainer
-
8. Distributed Processing and Handling Large Datasets
- Introduction
- Distributed tagging with execnet
- Distributed chunking with execnet
- Parallel list processing with execnet
- Storing a frequency distribution in Redis
- Storing a conditional frequency distribution in Redis
- Storing an ordered dictionary in Redis
- Distributed word scoring with Redis and execnet
- 9. Parsing Specific Data Types
- A. Penn Treebank Part-of-speech Tags
- Index
Product information
- Title: Python 3 Text Processing with NLTK 3 Cookbook
- Author(s):
- Release date: August 2014
- Publisher(s): Packt Publishing
- ISBN: 9781782167853
You might also like
book
Python Natural Language Processing Cookbook
Get to grips with solving real-world NLP problems, such as dependency parsing, information extraction, topic modeling, …
book
Python Natural Language Processing
Leverage the power of machine learning and deep learning to extract information from text data About …
book
Python Cookbook
The Python Cookbook is a collection of problems, solutions, and practical examples for Python programmers, written …
book
Natural Language Processing Recipes: Unlocking Text Data with Machine Learning and Deep Learning Using Python
Focus on implementing end-to-end projects using Python and leverage state-of-the-art algorithms. This book teaches you to …