June 2017
Beginner to intermediate
576 pages
15h 22m
English
One of the first things I do upon creating a new data object, is to run summary statistics. There is a Spark-specific function of the R summary function known as describe(). You can the specific function summary(); however, if you do this instead of using describe(), I would preface it with SparkR:: in order to specify which version of summary you are using:
head(SparkR::summary(out_sd))
The output appears in a slightly different format than if you ran a summary on a native R dataframe, but contains the basic measures that you are looking for, count, mean, stddev, min, and max:

We can also compare this summary ...