O'Reilly logo

Learning PySpark by Denny Lee, Tomasz Drabas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Other features of PySpark ML in action

At the beginning of this chapter, we described most of the features of the PySpark ML library. In this section, we will provide examples of how to use some of the Transformers and Estimators.

Feature extraction

We have used quite a few models from this submodule of PySpark. In this section, we'll show you how to use the most useful ones (in our opinion).

NLP - related feature extractors

As described earlier, the NGram model takes a list of tokenized text and produces pairs (or n-grams) of words.

In this example, we will take an excerpt from PySpark's documentation and present how to clean up the text before passing it to the NGram model. Here's how our dataset looks like (abbreviated for brevity):

Tip

For the full ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required