O'Reilly logo

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Tricky Statistics with Spark

In this chapter, you will learn the following recipes:

  • Working with Pandas
  • Variable identification
  • Sampling data
  • Summary and descriptive statistics
  • Generating frequency tables
  • Installing Pandas on Linux
  • Installing Pandas from source
  • Using IPython with PySpark
  • Creating Pandas DataFrames over Spark
  • Splitting, slicing, sorting, filtering and grouping DataFrames over Spark.
  • Implementing co-variance and correlation using DataFrames over Spark.
  • Concatenating and merging operations over DataFrames
  • Complex operations over DataFrames.
  • Sparkling Pandas

Introduction

Statistics refers to the mathematics and techniques with which we understand data. It is a vast field which plays a key role in the areas of data mining and artificial ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required