Chapter 9: Natural Language Processing

Terabytes of text data are created on a daily basis by users of all sorts of software, from enterprise systems to social networks. All this unprocessed data hides amazing opportunities to improve how businesses work.

In this chapter, we will learn how to clean and process our data in order to prepare it to create features that can be used as input to create machine learning models.

The topics we will be covering in this chapter are as follows:

  • Natural language processing
  • Removing unwanted strings
  • Stemming and lemmatization
  • word_tokenizer
  • Feature extraction from text

Technical requirements

Optimus can work with multiple backend technologies to process data, including GPUs. For GPUs, Optimus uses RAPIDS ...

Get Data Processing with Optimus now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.