Regression in MLlib

Spark MLlib has built-in methods for regression. To be able to use the built-in methods of Spark, you will have to install pyspark on your cluster (standalone or distributed cluster). The installation can be done using the following:

pip install pyspark

The MLlib library has the following regression methods:

  • Linear regression: We already learned about linear regression in earlier chapters; we can use this method using the LinearRegression class defined at By default, it uses minimized squared error with regularization. It supports L1 and L2 regularization, and a combination of them.
  • Generalized linear regression: The Spark MLlib has a subset of exponential family distributions like Gaussian, Poissons, ...

Get Hands-On Artificial Intelligence for IoT now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.