O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Streaming regression

Spark provides a built-in streaming machine learning model in the StreamingLinearAlgorithm class. Currently, only a linear regression implementation is available-StreamingLinearRegressionWithSGD-but future versions will include classification.

The streaming regression model provides two methods for usage:

  • trainOn: This takes DStream[LabeledPoint] as its argument. This tells the model to train on every batch in the input DStream. It can be called multiple times to train on different streams.
  • predictOn: This also takes DStream[LabeledPoint]. This tells the model to make predictions on the input DStream, returning a new DStream[Double] that contains the model predictions.

Under the hood, the streaming regression model ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required