One of the first things I do upon creating a new data object, is to run summary statistics. There is a Spark-specific function of the R summary function known as describe(). You can the specific function summary(); however, if you do this instead of using describe(), I would preface it with SparkR:: in order to specify which version of summary you are using:
The output appears in a slightly different format than if you ran a summary on a native R dataframe, but contains the basic measures that you are looking for, count, mean, stddev, min, and max:
We can also compare this summary ...