Streaming regression

Spark provides a built-in streaming machine learning model in the StreamingLinearAlgorithm class. Currently, only a linear regression implementation is available-StreamingLinearRegressionWithSGD-but future versions will include classification.

The streaming regression model provides two methods for usage:

  • trainOn: This takes DStream[LabeledPoint] as its argument. This tells the model to train on every batch in the input DStream. It can be called multiple times to train on different streams.
  • predictOn: This also takes DStream[LabeledPoint]. This tells the model to make predictions on the input DStream, returning a new DStream[Double] that contains the model predictions.

Under the hood, the streaming regression model ...

Get Machine Learning with Spark - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.