Basics of Natural Language Processing

If machine learning models only operate on numerical data, how can we transform our text into a numerical representation? That is exactly the focus of Natural Language Processing (NLP). Let's take a brief look at how this is done.

We'll begin with a small corpus of three sentences:

  1. The new kitten played with the other kittens
  2. She ate lunch
  3. She loved her kitten

We'll first convert our corpus into a bag-of-words (BOW) representation. We'll skip preprocessing for now. Converting our corpus into a BOW representation involves taking each word and its count to create what's called a term-document matrix. In a term-document matrix, each unique word is assigned to a column, and each document is assigned to ...

Get Python Machine Learning Blueprints - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.