O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Running summary statistics

One of the first things I do upon creating a new data object, is to run summary statistics. There is a Spark-specific function of the R summary function known as describe(). You can the specific function summary(); however, if you do this instead of using describe(), I would preface it with SparkR:: in order to specify which version of summary you are using:

head(SparkR::summary(out_sd)) 

The output appears in a slightly different format than if you ran a summary on a native R dataframe, but contains the basic measures that you are looking for, count, mean, stddev, min, and max:

We can also compare this summary ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required