O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Natural Language Processing and Computational Linguistics

Book Description

Work with Python and powerful open source tools such as Gensim and spaCy to perform modern text analysis, natural language processing, and computational linguistics algorithms.

About This Book
  • Discover the open source Python text analysis ecosystem, using spaCy, Gensim, scikit-learn, and Keras
  • Hands-on text analysis with Python, featuring natural language processing and computational linguistics algorithms
  • Learn deep learning techniques for text analysis
Who This Book Is For

This book is for you if you want to dive in, hands-first, into the interesting world of text analysis and NLP, and you're ready to work with the rich Python ecosystem of tools and datasets waiting for you!

What You Will Learn
  • Why text analysis is important in our modern age
  • Understand NLP terminology and get to know the Python tools and datasets
  • Learn how to pre-process and clean textual data
  • Convert textual data into vector space representations
  • Using spaCy to process text
  • Train your own NLP models for computational linguistics
  • Use statistical learning and Topic Modeling algorithms for text, using Gensim and scikit-learn
  • Employ deep learning techniques for text analysis using Keras
In Detail

Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data.

This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy.

You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning.

This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.

Style and approach

The book teaches NLP from the angle of a practitioner as well as that of a student. This is a tad unusual, but given the enormous speed at which new algorithms and approaches travel from scientific beginnings to industrial implementation, first principles can be clarified with the help of entirely practical examples.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Natural Language Processing and Computational Linguistics
  3. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  4. Contributors
    1. About the author
    2. About the reviewers
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. What is Text Analysis?
    1. What is text analysis?
    2. Where's the data at?
    3. Garbage in, garbage out
    4. Why should you do text analysis?
    5. Summary
    6. References
  7. Python Tips for Text Analysis
    1. Why Python?
    2. Text manipulation in Python
    3. Summary
    4. References
  8. spaCy's Language Models
    1. spaCy
    2. Installation
      1. Troubleshooting
      2. Language models
      3. Installing language models
      4. Installation – how and why?
      5. Basic preprocessing with language models
    3. Tokenizing text
      1. Part-of-speech (POS) – tagging
      2. Named entity recognition
      3. Rule-based matching
      4. Preprocessing
    4. Summary
    5. References
  9. Gensim – Vectorizing Text and Transformations and n-grams
    1. Introducing Gensim
    2. Vectors and why we need them
      1. Bag-of-words
      2. TF-IDF
      3. Other representations
    3. Vector transformations in Gensim
    4. n-grams and some more preprocessing
    5. Summary
    6. References
  10. POS-Tagging and Its Applications
    1. What is POS-tagging?
    2. POS-tagging in Python
      1. POS-tagging with spaCy
    3. Training our own POS-taggers
    4. POS-tagging code examples
    5. Summary
    6. References
  11. NER-Tagging and Its Applications
    1. What is NER-tagging?
    2. NER-tagging in Python
      1. NER-tagging with spaCy
    3. Training our own NER-taggers
    4. NER-tagging examples and visualization
    5. Summary
    6. References
  12. Dependency Parsing
    1. Dependency parsing
    2. Dependency parsing in Python
    3. Dependency parsing with spaCy
    4. Training our dependency parsers
    5. Summary
    6. References
  13. Topic Models
    1. What are topic models?
    2. Topic models in Gensim
      1. Latent Dirichlet allocation
      2. Latent semantic indexing
        1. Hierarchical Dirichlet process
      3. Dynamic topic models
    3. Topic models in scikit-learn
    4. Summary
    5. References
  14. Advanced Topic Modeling
    1. Advanced training tips
    2. Exploring documents
    3. Topic coherence and evaluating topic models
    4. Visualizing topic models
    5. Summary
    6. References
  15. Clustering and Classifying Text
    1. Clustering text
    2. Starting clustering
    3. K-means
    4. Hierarchical clustering
    5. Classifying text
    6. Summary
    7. References
  16. Similarity Queries and Summarization
    1. Similarity metrics
    2. Similarity queries
    3. Summarizing text
    4. Summary
    5. References
  17. Word2Vec, Doc2Vec, and Gensim
    1. Word2Vec
      1. Using Word2Vec with Gensim
    2. Doc2Vec
    3. Other word embeddings
      1. GloVe
      2. FastText
      3. WordRank
      4. Varembed
      5. Poincare
    4. Summary
    5. References
  18. Deep Learning for Text
    1. Deep learning
    2. Deep learning for text (and more)
    3. Generating text
    4. Summary
    5. References
  19. Keras and spaCy for Deep Learning
    1. Keras and spaCy
    2. Classification with Keras
    3. Classification with spaCy
    4. Summary
    5. References
  20. Sentiment Analysis and ChatBots
    1. Sentiment analysis
      1. Reddit for mining data
      2. Twitter for mining data
    2. ChatBots
    3. Summary
    4. References
  21. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think