O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Generalized linear regression

Linear regression follows a Gaussian distribution, whereas, generalized linear models (GLMs) are specifications of linear models where the response variable Y follows some distribution from the exponential family of distributions.

Let's train the bike sharing dataset by splitting it into 80 % training and 20% testing, use GeneralizedLinearRegression with regression evaluator from Spark to build the model, and get evaluation metrics around the test data.

@transient lazy val logger = Logger.getLogger(getClass.getName) def genLinearRegressionWithVectorFormat(vectorAssembler:    VectorAssembler, vectorIndexer: VectorIndexer, dataFrame:    DataFrame) = {    val lr = new GeneralizedLinearRegression()  .setFeaturesCol("features") ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required