January 2016
Intermediate to advanced
416 pages
8h 54m
English
DataFrames, like RDDs, are immutable. When you define a transformation on a DataFrame, this always creates a new DataFrame. The original DataFrame cannot be modified in place (this is notably different to pandas DataFrames, for instance).
Operations on DataFrames can be grouped into two: transformations, which result in the creation of a new DataFrame, and actions, which usually return a Scala type or have a side-effect. Methods like filter or withColumn are transformations, while methods like show or head are actions.
Transformations are lazy, much like transformations on RDDs. When you generate a new DataFrame by transforming an existing DataFrame, this results in the elaboration of an execution plan for ...
Read now
Unlock full access