We are now able to import datasets, even a big, problematic ones. Now, we need to learn the basic preprocessing routines in order to make it feasible for the next data science step.
First, if you need to apply a function to a limited section of rows, you can create a mask. A mask is a series of Boolean values (that is, True or False) that tells you whether the line is selected or not.
For example, let's say we want to select all the lines of the Iris dataset that have a sepal length greater than 6. We can simply do the following:
In: mask_feature = iris['sepal_length'] > 6.0In: mask_featureOut: 0 False 1 False ... 146 True 147 True 148 True 149 False
In the preceding simple example, we can immediately see which observations ...