Chapter 9: Natural Language Processing

Terabytes of text data are created on a daily basis by users of all sorts of software, from enterprise systems to social networks. All this unprocessed data hides amazing opportunities to improve how businesses work.

In this chapter, we will learn how to clean and process our data in order to prepare it to create features that can be used as input to create machine learning models.

The topics we will be covering in this chapter are as follows:

Natural language processing
Removing unwanted strings
Stemming and lemmatization
word_tokenizer
Feature extraction from text

Technical requirements

Optimus can work with multiple backend technologies to process data, including GPUs. For GPUs, Optimus uses RAPIDS ...

Get Data Processing with Optimus now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Processing with Optimus by Dr. Argenis Leon, Luis Aguirre

Chapter 9: Natural Language Processing

Technical requirements

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly