Working with missing data
In this section, we will discuss missing, NaN
, or null
values, in pandas data structures. It is a very common situation to arrive with missing data in an object. One such case that creates missing data is reindexing:
>>> df8 = pd.DataFrame(np.arange(12).reshape(4,3), columns=['a', 'b', 'c']) a b c 0 0 1 2 1 3 4 5 2 6 7 8 3 9 10 11 >>> df9 = df8.reindex(columns = ['a', 'b', 'c', 'd']) a b c d 0 0 1 2 NaN 1 3 4 5 NaN 2 6 7 8 NaN 4 9 10 11 NaN >>> df10 = df8.reindex([3, 2, 'a', 0]) a b c 3 9 10 11 2 6 7 8 a NaN NaN NaN 0 0 1 2
To manipulate missing values, we can use the isnull()
or notnull()
functions to detect the missing values in a Series object, as well as in a DataFrame object:
>>> df10.isnull() a b c 3 False False ...
Get Python: Real-World Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.