One numerical feature and one categorical feature

This is the third possibility for bivariate relationships. Again, we have some standard widely used tools: the boxplot for visualization, and comparing means/medians for an initial exploration of the effect of the categories in the average values of a numerical feature.

Firstly, let's remember what a boxplot is. Although its basic construction can vary between different software tools, it is usually constructed as:

  • The graph starts at the minimum value (bottom horizontal line).
  • Then, the box starts at the 25th percentile. The horizontal line inside the box corresponds to the median (50th percentile), and the top of the box corresponds to the 75th percentile.
  • Finally, it ends at the maximum ...

Get Hands-On Predictive Analytics with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.