O'Reilly logo

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 3. Data Analysis with Spark

In this chapter, we will cover the following recipes on performing data analysis with Spark:

  • Univariate analysis
  • Bivariate analysis
  • Missing value treatment
  • Outlier detection
  • Use case - analyzing the MovieLens dataset
  • Use case - analyzing the Uber dataset

Introduction

The techniques for data exploration and preparation are typically applied before applying models on the data and this also helps in developing complex statistical models. These techniques are also important for eliminating or sharpening a potential hypothesis which can be addressed by the data. The amount of time spent in preprocessing and data exploration provides the quality input which decides the quality of the output. Once the business hypothesis is ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required