Basics of Natural Language Processing

If machine learning models only operate on numerical data, how can we transform our text into a numerical representation? That is exactly the focus of Natural Language Processing (NLP). Let's take a brief look at how this is done.

We'll begin with a small corpus of three sentences:

  1. The new kitten played with the other kittens
  2. She ate lunch
  3. She loved her kitten

We'll first convert our corpus into a bag-of-words (BOW) representation. We'll skip preprocessing for now. Converting our corpus into a BOW representation involves taking each word and its count to create what's called a term-document matrix. In a term-document matrix, each unique word is assigned to a column, and each document is assigned to ...

Get Python Machine Learning Blueprints - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.