O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

SparkSession

SparkSession allows programming with the DataFrame and Dataset APIs. It is a single point of entry for these APIs.

First, we need to create an instance of the SparkConf class and use it to create the SparkSession instance. Consider the following example:

val spConfig = (new SparkConf).setMaster("local").setAppName("SparkApp") val spark = SparkSession   .builder()   .appName("SparkUserData").config(spConfig)   .getOrCreate()

Next we can use spark object to create a DataFrame:

val user_df = spark.read.format("com.databricks.spark.csv")   .option("delimiter", "|").schema(customSchema)   .load("/home/ubuntu/work/ml-resources/spark-ml/data/ml-100k/u.user")val first = user_df.first()

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required