O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Transformers

A transformer is an abstraction that includes feature transformers and learned models. The transformer implements the transform() method, which converts one DataFrame to another.

A feature transformer takes a DataFrame, reads the text, maps it to a new column, and outputs a new DataFrame.

A learning model takes a DataFrame, reads the column containing feature vectors, predicts the label for each feature vector, and outputs a new DataFrame with the  predicted labels.

Custom transformers are required to follow the following steps:

  1. Implement the transform method.
  2. Specify inputCol and outputCol.
  3. Accept DataFrame as input and return DataFrame as output.

In nutshell, the transformer: DataFrame =[transform]=> DataFrame.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required