June 2017
Beginner to intermediate
576 pages
15h 22m
English
Graphical methods are best for initially scanning for outliers. Boxplots, histograms, and normal probability plots are very useful tools.
In this code example, sales data is generated with an average sale of $10,000 and a standard deviation of $3,000. The boxplot shows some data above and below the whiskers of the diagram. Additionally, the histogram shows a gap between the highest bar and the one just below that. These are clues that potential outliers need to be looked at more closely:
set.seed(4070) #generate sales data outlier.df <-data.frame(sales=rnorm(100,mean=10000,sd=3000)) #plot the data, to possible outliers par(mfrow=c(1,2)) boxplot(outlier.df$sales, ylab="sales") hist(outlier.df$sales)
Another way of looking ...