O'Reilly logo

Python Data Science Essentials - Third Edition by Luca Massaron, Alberto Boschetti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data preprocessing

We are now able to import datasets, even a big, problematic ones. Now, we need to learn the basic preprocessing routines in order to make it feasible for the next data science step.

First, if you need to apply a function to a limited section of rows, you can create a mask. A mask is a series of Boolean values (that is, True or False) that tells you whether the line is selected or not.

For example, let's say we want to select all the lines of the Iris dataset that have a sepal length greater than 6. We can simply do the following:

In: mask_feature = iris['sepal_length'] > 6.0In: mask_featureOut:   0     False       1     False     ...     146     True     147     True     148     True     149    False

In the preceding simple example, we can immediately see which observations ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required