O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Transforming the data

Another way to look at outliers is by first standardizing them to a normal distribution with a mean of 0 and standard deviation of 1. Using standard normal form is convenient, since the properties of the distribution never change, and some critical cut-off points can be committed to memory. For example, for a standard normal distribution, quartile 1 is always at -.67 and quartile 3 is always at +.67, so it is easy to compute the interquartile range to memory, which is 1.34. Using the interquartile range rule to identify outliers means that we will take 1.5 times this amount to derive the value of 2.01, so we will be considering any data point above .67+2.01=2.68, or -.67-2.01=-2.68, to be a possible outlier. This is ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required