Execute the following steps to carry out EDA.
- Import the libraries:
import pandas as pdimport seaborn as snsimport numpy as np
- Get summary statistics for numeric variables:
df.describe().transpose().round(2)
This results in the following table:
- Get summary statistics for categorical variables:
df.describe(include='object').transpose()
This results in the following table:
- Plot the distribution of age and, additionally, split it by gender:
fig, ax = plt.subplots()sns.distplot(df.loc[df.sex=='Male', 'age'].dropna(), hist=False, ...