Chapter 2. Tricky Statistics with Spark
In this chapter, you will learn the following recipes:
- Working with Pandas
- Variable identification
- Sampling data
- Summary and descriptive statistics
- Generating frequency tables
- Installing Pandas on Linux
- Installing Pandas from source
- Using IPython with PySpark
- Creating Pandas DataFrames over Spark
- Splitting, slicing, sorting, filtering and grouping DataFrames over Spark.
- Implementing co-variance and correlation using DataFrames over Spark.
- Concatenating and merging operations over DataFrames
- Complex operations over DataFrames.
- Sparkling Pandas
Statistics refers to the mathematics and techniques with which we understand data. It is a vast field which plays a key role in the areas of data mining and artificial ...