O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

NLTK Essentials

Book Description

Build cool NLP and machine learning applications using NLTK and other Python libraries

In Detail

Natural Language Processing (NLP) is the field of artificial intelligence and computational linguistics that deals with the interactions between computers and human languages. With the instances of human-computer interaction increasing, it’s becoming imperative for computers to comprehend all major natural languages. Natural Language Toolkit (NLTK) is one such powerful and robust tool.

You start with an introduction to get the gist of how to build systems around NLP. We then move on to explore data science-related tasks, following which you will learn how to create a customized tokenizer and parser from scratch. Throughout, we delve into the essential concepts of NLP while gaining practical insights into various open source tools and libraries available in Python for NLP. You will then learn how to analyze social media sites to discover trending topics and perform sentiment analysis. Finally, you will see tools which will help you deal with large scale text.

By the end of this book, you will be confident about NLP and data science concepts and know how to apply them in your day-to-day work.

What You Will Learn

  • Get a glimpse of the complexity of natural languages and how they are processed by machines
  • Clean and wrangle text using tokenization and chunking to help you better process data
  • Explore the different types of tags available and learn how to tag sentences
  • Create a customized parser and tokenizer to suit your needs
  • Build a real-life application with features such as spell correction, search, machine translation and a question answering system
  • Retrieve any data content using crawling and scraping
  • Perform feature extraction and selection, and build a classification system on different pieces of texts
  • Use various other Python libraries such as pandas, scikit-learn, matplotlib, and gensim
  • Analyse social media sites to discover trending topics and perform sentiment analysis

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. NLTK Essentials
    1. Table of Contents
    2. NLTK Essentials
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Introduction to Natural Language Processing
      1. Why learn NLP?
      2. Let's start playing with Python!
        1. Lists
        2. Helping yourself
        3. Regular expressions
        4. Dictionaries
        5. Writing functions
      3. Diving into NLTK
      4. Your turn
      5. Summary
    9. 2. Text Wrangling and Cleansing
      1. What is text wrangling?
      2. Text cleansing
      3. Sentence splitter
      4. Tokenization
      5. Stemming
      6. Lemmatization
      7. Stop word removal
      8. Rare word removal
      9. Spell correction
      10. Your turn
      11. Summary
    10. 3. Part of Speech Tagging
      1. What is Part of speech tagging
        1. Stanford tagger
        2. Diving deep into a tagger
        3. Sequential tagger
          1. N-gram tagger
          2. Regex tagger
        4. Brill tagger
        5. Machine learning based tagger
      2. Named Entity Recognition (NER)
        1. NER tagger
      3. Your Turn
      4. Summary
    11. 4. Parsing Structure in Text
      1. Shallow versus deep parsing
      2. The two approaches in parsing
      3. Why we need parsing
      4. Different types of parsers
        1. A recursive descent parser
        2. A shift-reduce parser
        3. A chart parser
        4. A regex parser
      5. Dependency parsing
      6. Chunking
      7. Information extraction
        1. Named-entity recognition (NER)
        2. Relation extraction
      8. Summary
    12. 5. NLP Applications
      1. Building your first NLP application
      2. Other NLP applications
        1. Machine translation
        2. Statistical machine translation
        3. Information retrieval
          1. Boolean retrieval
          2. Vector space model
          3. The probabilistic model
        4. Speech recognition
        5. Text classification
        6. Information extraction
        7. Question answering systems
        8. Dialog systems
        9. Word sense disambiguation
        10. Topic modeling
        11. Language detection
        12. Optical character recognition
      3. Summary
    13. 6. Text Classification
      1. Machine learning
      2. Text classification
      3. Sampling
        1. Naive Bayes
        2. Decision trees
        3. Stochastic gradient descent
        4. Logistic regression
        5. Support vector machines
      4. The Random forest algorithm
      5. Text clustering
        1. K-means
      6. Topic modeling in text
        1. Installing gensim
      7. References
      8. Summary
    14. 7. Web Crawling
      1. Web crawlers
      2. Writing your first crawler
      3. Data flow in Scrapy
        1. The Scrapy shell
        2. Items
      4. The Sitemap spider
      5. The item pipeline
      6. External references
      7. Summary
    15. 8. Using NLTK with Other Python Libraries
      1. NumPy
        1. ndarray
          1. Indexing
        2. Basic operations
        3. Extracting data from an array
        4. Complex matrix operations
          1. Reshaping and stacking
          2. Random numbers
      2. SciPy
        1. Linear algebra
        2. eigenvalues and eigenvectors
        3. The sparse matrix
        4. Optimization
      3. pandas
        1. Reading data
        2. Series data
        3. Column transformation
        4. Noisy data
      4. matplotlib
        1. Subplot
        2. Adding an axis
        3. A scatter plot
        4. A bar plot
        5. 3D plots
      5. External references
      6. Summary
    16. 9. Social Media Mining in Python
      1. Data collection
        1. Twitter
      2. Data extraction
        1. Trending topics
      3. Geovisualization
        1. Influencers detection
        2. Facebook
        3. Influencer friends
      4. Summary
    17. 10. Text Mining at Scale
      1. Different ways of using Python on Hadoop
        1. Python streaming
        2. Hive/Pig UDF
        3. Streaming wrappers
      2. NLTK on Hadoop
        1. A UDF
        2. Python streaming
      3. Scikit-learn on Hadoop
      4. PySpark
      5. Summary
    18. Index