Improving the model performance by removing outliers

In the Data visualization section, we saw that some predictors have outliers. Outliers are the values that, when compared to others, are particularly extreme. Outliers are a problem because they tend to distort data analysis results, in particular, in descriptive statistics and correlations. Outliers have a large influence on the fit, because squaring the residuals magnifies the effects of these extreme data points. For these reasons, it may be necessary to remove these values first to improve the performance of the model.

In some cases, you may be tempted to remove outliers that are influential or have an excessive impact on the synthesis measures you want to consider (such as the mean ...

Get Keras 2.x Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.