How it works...
In this recipe, we removed the outliers of a variable of the Boston House Prices dataset from scikit-learn. To remove the outliers, we first identified those values visually through a boxplot. Next, we created a function to find the limits within which we found the majority of the values of the variable. Next, we created a Boolean vector to flag the values of the variable that sit beyond those boundaries, and, finally, we removed those observations from the dataset.
To load the data, we first imported the dataset from sklearn.datasets and then used load_boston(). Next, we captured the data in a dataframe using pandas' DataFrame(), indicating that the data is stored in boston_dataset.data and the variable names in boston_dataset.feature_names ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access