July 2017
Intermediate to advanced
796 pages
18h 55m
English
Let us look at an example of loading a CSV (comma-separated Values) file into a DataFrame. Whenever a text file contains a header, read API can infer the schema by reading the header line. We also have the option to specify the separator to be used to split the text file lines.
We read the csv inferring the schema from the header line and uses comma (,) as the separator. We also show use of schema command and printSchema command to verify the schema of the input file.
scala> val statesDF = spark.read.option("header", "true") .option("inferschema", "true") .option("sep", ",") .csv("statesPopulation.csv")statesDF: org.apache.spark.sql.DataFrame = [State: string, Year: int ... 1 more field]scala> statesDF.schemares92: org.apache.spark.sql.types.StructType ...Read now
Unlock full access