January 2016
Intermediate to advanced
416 pages
8h 54m
English
So far, we have been using Spark SQL and DataFrames through the Spark shell. To use it in standalone programs, you will need to create it explicitly, from a Spark context:
val conf = new SparkConf().setAppName("applicationName")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)Additionally, importing the implicits object nested in sqlContext allows the conversions of RDDs to DataFrames:
import sqlContext.implicits._
We will use DataFrames extensively in the next chapter to manipulate data to get it ready for use with MLlib.
Read now
Unlock full access