Visualization

It's good to visualize to get a general idea of how the data is structured, what possible issues may arise, and if there are any irregularities that we have to take care of.

In the context of multiple topics or categories, it is important to know what the distribution of topics is. A uniform class distribution is the easiest to deal with because there are no under-represented or over-represented categories. However, we frequently have a skewed distribution with one or more categories dominating. We herein use the seaborn package (https://seaborn.pydata.org/) to compute the histogram of categories and plot it utilizing the matplotlib package (https://matplotlib.org/). We can install both packages via pip. Now let’s display the ...

Get Python Machine Learning By Example now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.