Grubbs' test and checking outliers

In statistics, or particularly in R programming, an outlier is defined as an observation that is far removed from most of the other observations. Often an outlier is present due to a measurement error.

The following script is used to detect the particular outliers for each and every attribute:

> outlierKD <- function(dt, var) { +           var_name <- eval(substitute(var),eval(dt))                  +           na1 <- sum(is.na(var_name))                  +           m1 <- mean(var_name, na.rm = T)                  +           par(mfrow=c(2, 2), oma=c(0,0,3,0))                  +           boxplot(var_name, main="With outliers")                  +           hist(var_name, main="With outliers", xlab=NA, ylab=NA)                  +           outlier <- boxplot.stats(var_name)$out                  +           mo <- mean(outlier)                  +  var_name <- ifelse(var_name %in% outlier, NA, var_name) ...

Get Hands-On Exploratory Data Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.