Finding the problematic data

Now that we have our data, let's see how to identify and fix data issues. In Chapter 1, Introduction to Data Analysis, we learned the importance of examining our data when we get it; it's not a coincidence that many of the ways to inspect the data will help us find the issues. Examining the head() and tail() of the data is always a good first step:

>>> df.head()

In practice, head() and tail() aren't as robust as the rest of what we will discuss here, but we can still get some useful information by starting here. Our data is in the wide format, and at a quick glance we can see that we have some potential issues. Sometimes, the station field is recorded with a ?, while other times it has a station ID. We have values ...

Get Hands-On Data Analysis with Pandas now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.