Leverage the power of machine learning and deep learning to extract information from text data
About This Book
- Implement Machine Learning and Deep Learning techniques for efficient natural language processing
- Get started with NLTK and implement NLP in your applications with ease
- Understand and interpret human languages with the power of text analysis via Python
Who This Book Is For
This book is intended for Python developers who wish to start with natural language processing and want to make their applications smarter by implementing NLP in them.
What You Will Learn
- Focus on Python programming paradigms, which are used to develop NLP applications
- Understand corpus analysis and different types of data attribute.
- Learn NLP using Python libraries such as NLTK, Polyglot, SpaCy, Standford CoreNLP and so on
- Learn about Features Extraction and Feature selection as part of Features Engineering.
- Explore the advantages of vectorization in Deep Learning.
- Get a better understanding of the architecture of a rule-based system.
- Optimize and fine-tune Supervised and Unsupervised Machine Learning algorithms for NLP problems.
- Identify Deep Learning techniques for Natural Language Processing and Natural Language Generation problems.
This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them.
During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis.
You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data.
By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world.
Style and approach
This book teaches the readers various aspects of natural language Processing using NLTK. It takes the reader from the basic to advance level in a smooth way.
Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.
Table of Contents
Practical Understanding of a Corpus and Dataset
- What is a corpus?
- Why do we need a corpus?
- Understanding corpus analysis
- Understanding types of data attributes
- Exploring different file formats for corpora
- Resources for accessing free corpora
- Preparing a dataset for NLP applications
- Web scraping
Understanding the Structure of a Sentences
- Understanding components of NLP
- Natural language understanding
- Defining context-free grammar
- What is morphology?
- What are morphemes?
- What is a stem?
- What is morphological analysis?
- What is a word?
- Classification of morphemes
- What is the difference between a stem and a root?
- Lexical analysis
- What is a token?
- What are part of speech tags?
- Process of deriving tokens
- Difference between stemming and lemmatization
- Syntactic analysis
- Semantic analysis
- Handling ambiguity
- Discourse integration
- Pragmatic analysis
- Handling corpus-raw text
- Handling corpus-raw sentences
- Basic preprocessing
- Practical and customized preprocessing
Feature Engineering and NLP Algorithms
- Understanding feature engineering
Basic feature of NLP
Parsers and parsing
- Understanding the basics of parsers
- Understanding the concept of parsing
- Developing a parser from scratch
- Types of grammar
- Calculating the probability of a tree
- Calculating the probability of a string
- Grammar transformation
- Developing a parser with the Cocke-Kasami-Younger Algorithm
- Developing parsers step-by-step
- Existing parser tools
- Customizing parser tools
- POS tagging and POS taggers
- Name entity recognition
- Bag of words
- Semantic tools and resources
- Parsers and parsing
Basic statistical features for NLP
- Basic mathematics
- Basic concepts of linear algebra for NLP
- Basic concepts of the probabilistic theory for NLP
- Encoders and decoders
- Probabilistic models
- Advantages of features engineering
- Challenges of features engineering
Advanced Feature Engineering and NLP Algorithms
- Recall word embedding
- Understanding the basics of word2vec
- Converting the word2vec model from black box to white box
- Understanding the components of the word2vec model
- Understanding the logic of the word2vec model
Understanding algorithmic techniques and the mathematics behind the word2vec model
- Understanding the basic mathematics for the word2vec algorithm
- Techniques used at the vocabulary building stage
- Techniques used at the context building stage
Algorithms used by neural networks
- Structure of the neurons
- Training a simple neuron
- Techniques used to generate final vectors and probability prediction stage
- Some of the facts related to word2vec
- Applications of word2vec
- Implementation of simple examples
- Advantages of word2vec
- Challenges of word2vec
- How is word2vec used in real-life applications?
- When should you use word2vec?
- Developing something interesting
- Extension of the word2vec concept
- Importance of vectorization in deep learning
Rule-Based System for NLP
- Understanding of the rule-based system
- Purpose of having the rule-based system
- Architecture of the RB system
- Understanding the RB system development life cycle
Developing NLP applications using the RB system
- Thinking process for making rules
- Python for pattern-matching rules for a proofreading application
- Grammar correction
- Template-based chatbot application
- Comparing the rule-based approach with other approaches
- Advantages of the rule-based system
- Disadvantages of the rule-based system
- Challenges for the rule-based system
- Understanding word-sense disambiguation basics
- Discussing recent trends for the rule-based system
Machine Learning for NLP Problems
- Understanding the basics of machine learning
- Development steps for NLP applications
Understanding ML algorithms and other concepts
- Supervised ML
- Unsupervised ML
- Semi-supervised ML
- Hybrid approaches for NLP applications
Deep Learning for NLU and NLG Problems
An overview of artificial intelligence
- The basics of AI
- Stages of AI
- Types of artificial intelligence
- Goals and applications of AI
- Comparing NLU and NLG
- A brief overview of deep learning
- Basics of neural networks
- Implementation of ANN
- Deep learning and deep neural networks
- Deep learning techniques and NLU
- Deep learning techniques and NLG
- Gradient descent-based optimization
- Artificial intelligence versus human intelligence
- An overview of artificial intelligence
- Advanced Tools
- How to Improve Your NLP Skills
- Installation Guide
- Title: Python Natural Language Processing
- Release date: July 2017
- Publisher(s): Packt Publishing
- ISBN: 9781787121423