Handling missing values and data anomalies

Let's do a check to see whether there are any missing values in our dataset:

print(df.isnull().sum())

We'll see the following output showing the number of missing values in each column:

We can see that there are only five rows (out of 500,000 rows) with missing data. With a missing data percentage of just 0.001%, it seems that we don't have a problem with missing data. Let's go ahead and remove those five rows with missing data:

df = df.dropna()

At this point, we should also check the data for outliers. In a dataset as massive as this, there are bound to be outliers, which can skew our model. Let's ...

Get Neural Network Projects with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.