July 2018
Intermediate to advanced
474 pages
13h 37m
English
There are features that the pyspark dataframe that are similar to those of the pandas dataframe and can perform some summary statistics on specific columns.
In pandas, we perform summary statistics using the following script: dataframe['column'].describe().
In pyspark, we perform summary statistics using the following script: dataframe.describe('column').show().
Read now
Unlock full access