July 2017
Intermediate to advanced
796 pages
18h 55m
English
Spark SQL can read data from external storage systems such as files, Hive tables, and JDBC databases through the DataFrameReader interface.
The format of the API call is spark.read.inputtype
Let's look at a couple of simple examples of reading CSV files into DataFrames:
scala> val statesPopulationDF = spark.read.option("header", "true").option("inferschema", "true").option("sep", ",").csv("statesPopulation.csv")statesPopulationDF: org.apache.spark.sql.DataFrame = [State: string, Year: int ... 1 more field]scala> val statesTaxRatesDF = spark.read.option("header", "true").option("inferschema", "true").option("sep", ",").csv("statesTaxRates.csv")statesTaxRatesDF: org.apache.spark.sql.DataFrame ...Read now
Unlock full access