O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Detecting outliers

Graphical methods are best for initially scanning for outliers. Boxplots, histograms, and normal probability plots are very useful tools.

In this code example, sales data is generated with an average sale of $10,000 and a standard deviation of $3,000. The boxplot shows some data above and below the whiskers of the diagram. Additionally, the histogram shows a gap between the highest bar and the one just below that. These are clues that potential outliers need to be looked at more closely:

set.seed(4070) #generate sales data outlier.df <-data.frame(sales=rnorm(100,mean=10000,sd=3000)) #plot the data, to possible outliers par(mfrow=c(1,2)) boxplot(outlier.df$sales, ylab="sales") hist(outlier.df$sales) 

Another way of looking ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required