February 2017
Intermediate to advanced
274 pages
5h 58m
English
At the beginning of this chapter, we described most of the features of the PySpark ML library. In this section, we will provide examples of how to use some of the Transformers and Estimators.
We have used quite a few models from this submodule of PySpark. In this section, we'll show you how to use the most useful ones (in our opinion).
As described earlier, the NGram model takes a list of tokenized text and produces pairs (or n-grams) of words.
In this example, we will take an excerpt from PySpark's documentation and present how to clean up the text before passing it to the NGram model. Here's how our dataset looks like (abbreviated for brevity):
For the full ...
Read now
Unlock full access