How it works...

We created an RDD from dense vector data followed by the generation of summary statistics on it using the statistics object. Once the colStats() method returned, we retrieved summary statistics such as the mean, variance, minimum, maximum, and so on.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.