Standalone programs

So far, we have been using Spark SQL and DataFrames through the Spark shell. To use it in standalone programs, you will need to create it explicitly, from a Spark context:

val conf = new SparkConf().setAppName("applicationName")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

Additionally, importing the implicits object nested in sqlContext allows the conversions of RDDs to DataFrames:

import sqlContext.implicits._

We will use DataFrames extensively in the next chapter to manipulate data to get it ready for use with MLlib.

Get Scala for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.