Powerful Exploratory Data Analysis with MLlib

In this chapter, we will explore Spark's capability to perform regression tasks with models such as linear regression and support-vector machines (SVMs). We will learn how to compute summary statistics with MLlib, and discover correlations in datasets using Pearson and Spearman correlations. We will also test our hypothesis on large datasets.

We will cover the following topics:

  • Computing summary statistics with MLlib
  • Using the Pearson and Spearman methods to discover correlations
  • Testing our hypotheses on large datasets

Get Hands-On Big Data Analytics with PySpark now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.