Univariate EDA for categorical features

For categorical features, EDA is actually easier, as features have a limited number of categories. The first thing we would like to know is the number that we have in every category. It is almost always useful to express this as a percentage or proportion of the total count.

On the other hand, just as the histogram is the default visualization for a numerical feature, the barplot is the default way to visualize the distribution of a categorical feature. pandas makes this very easy. Since we have only three categorical features, we won't create a function like the one we created for numerical features.

Let's take a look at the cut feature:

feature = categorical_features[0]count = diamonds[feature].value_counts() ...

Get Hands-On Predictive Analytics with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.