Exploratory data analysis

First, we want to see how many individuals of each class we have. This is important, because if the class distribution is very imbalanced (like 1 to 100, for example), we will have problems training our classification models. You can get data frame columns via the dot notation. For example, df.label will return you the label column as a new data frame. The data frame class has all kinds of useful methods for calculating the summary statistics. The value_counts() method returns the counts of each element type in the data frame:

In []: 
df.label.value_counts() 
Out[]: 
platyhog       520 
rabbosaurus    480 
Name: label, dtype: int64 

The class distribution looks okay for our purposes. Now let's explore the features.

We need to ...

Get Machine Learning with Swift now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.