We started the analysis by using a very simple, yet powerful method of a pandas DataFrame—describe. It printed summary statistics, such as count, mean, min/max, and quartiles of all the numeric variables in the DataFrame. By inspecting these metrics, we could infer the value range of a certain feature, or whether the distribution was skewed (by looking at the difference between mean and median). Also, we could easily spot values outside the plausible range—for example, a negative or very small age.
The count metric represents the number of non-null observations, so it is also ...