One of the most common ways to visualize a dataset is through bar charts. The idea is that, if we have an attribute that could only contain a specific set of values, seeing the distribution of the counts of those unique values could give us an insight into which factor could affect the dependent variable we are interested in (in this case, it is whether a person has Parkinson's or not).
But first, we will use bar charts to visualize the amount of missing data we have in our current dataset:
#%%missing_data = combined_user_df.isnull().sum()g = sns.barplot(missing_data.index, missing_data)g.set_xticklabels(labels=missing_data.index, rotation=90)plt.show()
Running this code cell should produce the following visualization: ...