Examining the spark-linear-regression.py script

Open up spark-linear-regression.py file and have a look at the code. First we'll import, from the ML library, a regression, a LinearRegression class:

from pyspark.ml.regression import LinearRegression 

Note that we're using ml instead of MLlib here. ml is basically where the new data frame APIs live, and going forward, that's going to be where Spark wants you to start using these. We're also going to import SparkSession and Vectors, which we're going to need in order to represent our feature data within our algorithm:

from pyspark.sql import SparkSession 
from pyspark.ml.linalg import Vectors 

Let's go ahead and look at the script itself, down in line 11. We'll start by creating a SparkSession ...

Get Frank Kane's Taming Big Data with Apache Spark and Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.