How it works...
In this recipe, we replaced the outliers of one variable of the Boston House Prices dataset from scikit-learn, by the 5th and 95th percentiles. We first loaded the data as described in the How it works section of the Trimming outliers from the dataset recipe in this chapter. To replace the outliers, we created a function, that takes the dataframe, the variable name, and the 5th and 95th percentiles and uses Numpy's where() to replace the values bigger or smaller than those percentiles by the values of those percentiles.
NumPy's where() scans each observation and if the value is bigger than the 95th percentile, it replaces it with the 95th percentile; otherwise, it evaluates whether the value is smaller than the 5th percentile, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access