Regression in MLlib

Spark MLlib has built-in methods for regression. To be able to use the built-in methods of Spark, you will have to install pyspark on your cluster (standalone or distributed cluster). The installation can be done using the following:

pip install pyspark

The MLlib library has the following regression methods:

  • Linear regression: We already learned about linear regression in earlier chapters; we can use this method using the LinearRegression class defined at pyspark.ml.regression. By default, it uses minimized squared error with regularization. It supports L1 and L2 regularization, and a combination of them.
  • Generalized linear regression: The Spark MLlib has a subset of exponential family distributions like Gaussian, Poissons, ...

Get Hands-On Artificial Intelligence for IoT now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.