February 2017
Intermediate to advanced
274 pages
5h 58m
English
There are two different methods for converting existing RDDs to DataFrames (or Datasets[T]): inferring the schema using reflection, or programmatically specifying the schema. The former allows you to write more concise code (when your Spark application already knows the schema), while the latter allows you to construct DataFrames when the columns and their data types are only revealed at run time. Note, reflection is in reference to schema reflection as opposed to Python reflection.
In the process of building the DataFrame and running the queries, we skipped over the fact that the schema for this DataFrame was automatically defined. Initially, row objects are constructed by passing a ...
Read now
Unlock full access