O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Standard scalar

Standard scalar standardizes features of the data set by scaling to unit variance and removing the mean (optionally) using column summary statistics on the samples in the training set.

This process is a very common pre-processing step.

Standardization improves the convergence rate during the optimization process. It also prevents features with large variances from exerting an overly large influence during model training.

StandardScaler class has the following parameters in the constructor:

new StandardScaler(withMean: Boolean, withStd: Boolean)

  • withMean: False by default. Centers the data with mean before scaling. It will build a dense output, does not work on sparse input and will raise an exception.
  • withStd: True by default. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required