Schema structure of data
A schema is the description of the structure of your data and can be either Implicit or Explicit.
Since the DataFrames are internally based on the RDD, there are two main methods of converting existing RDDs into datasets. An RDD can be converted into a dataset by using reflection to infer the schema of the RDD. A second method for creating datasets is through a programmatic interface, using which you can take an existing RDD and provide a schema to convert the RDD into a dataset with schema.
In order to create a DataFrame from an RDD by inferring the schema using reflection, the Scala API for Spark provides case classes which can be used to define the schema of the table. The DataFrame is created programmatically ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access