Working with Outliers
An outlier is a data point that is significantly different from the remaining data. Statistical parameters such as the mean and variance are sensitive to outliers. Outliers may also affect the performance of some machine learning models, such as linear regression or AdaBoost. Therefore, we may want to remove or engineer the outliers in the variables of our dataset.
How can we engineer outliers? One way to handle outliers is to perform variable discretization with any of the techniques we covered in Chapter 5, Performing Variable Discretization. With discretization, the outliers will fall in the lower or upper intervals and, therefore, will be treated as the remaining lower or higher values of the variable. An alternative ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access