July 2017
Intermediate to advanced
796 pages
18h 55m
English
The creation of a DataFrame can be done in several ways:
A DataFrame can be created by loading a CSV file. We will look at a CSV statesPopulation.csv, which is being loaded as a DataFrame.
The CSV has the following format of US states populations from years 2010 to 2016.
| State | Year | Population |
| Alabama | 2010 | 4785492 |
| Alaska | 2010 | 714031 |
| Arizona | 2010 | 6408312 |
| Arkansas | 2010 | 2921995 |
| California | 2010 | 37332685 |
Since this CSV has a header, we can use it to quickly load into a DataFrame with an implicit schema detection.
scala> val statesDF = spark.read.option("header", "true").option("inferschema", ...Read now
Unlock full access