O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Training a regression model on the bike sharing dataset

Linear regression

Linear regression is the most commonly used algorithm. At the core of the regression analysis is the task of fitting a single line through a data plot. Linear equation is described by y = c + b*x, where y = estimated dependent, c = constant, b = regression coefficients, and x = independent variable.

Let's train the bike sharing dataset by splitting it into 80% training and 20% testing, use LinearRegression with the regression evaluator from Spark to build the model, and get evaluation metrics around the test data. The linearRegressionWithVectorFormat method uses categorical data, whereas linearRegressionWithSVMFormat uses the libsvm format of the Bike-sharing dataset. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required