O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Linear regression

Linear regression involves a little more work than statistics. We need the data in a vector form along with a few more parameters; such as the learning rate, that is, the step size. We will also split the Dataset into training and test, as shown in the later part of this chapter.

Data transformation and feature extraction

The ml.feature library has a class vector assembler that transforms the data into a vector of features:

 // // Linear Regression // // Transformation to a labeled data that Linear Regression Can use val cars1 = cars.na.drop() val assembler = new VectorAssembler() assembler.setInputCols(Array("displacement","hp","torque","CRatio","RARatio","CarbBarrells","NoOfSpeed","length","width","weight","automatic")) assembler.setOutputCol("features") ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required