December 2018
Beginner to intermediate
330 pages
8h 32m
English
In Chapter 2, Problem Understanding and Data Preparation, we produced descriptive statistics for the numerical features when preparing this dataset. We used them to identify possible problems with some of the values, and used the mean, the standard deviation, and some of the percentiles to determine whether some of the observed large values could be considered outliers.
Here, we will try to extract more information from these descriptive statistics and gain more understanding of each of our features.
For the numerical EDA, we will calculate the most commonly used descriptive statistics. In fact, they are so common that the pandas describe() Series method provides us with their calculations: count, mean, ...