O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Detecting outliers

Given a variable, outliers are values that are very distant from other values of that variable. Outliers are quite common, and often caused by human or measurement errors. Outliers can strongly derail a model.

To demonstrate, let's look at two simple datasets and see how their mean is influenced by the presence of an outlier.

Consider the two datasets with few samples each: A = [1,2,3,4] and B = [1,2,3,4, 100]. The 5th value in the B dataset,  100, is obviously an outlier: mean(A) = 2.5, while mean(B) = 22. An outlier can have a large impact on a metric. Since most machine learning algorithms are based on distance or variance measurements, outliers can have a high impact on the performance of a model.

Multiple linear regression ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required