Grubbs test and checking outliers

In statistics, or particularly in R programming, an outlier is defined as an observation that is far removed from most of the other observations. Often, an outlier is present due to a measurement error.

The following script is used to detect the particular outliers for each and every attribute:

> outlierKD <- function(dt, var) {+           var_name <- eval(substitute(var),eval(dt))          +           na1 <- sum(is.na(var_name))          +           m1 <- mean(var_name, na.rm = T)          +           par(mfrow=c(2, 2), oma=c(0,0,3,0))          +           boxplot(var_name, main="With outliers")          +           hist(var_name, main="With outliers", xlab=NA, ylab=NA)          +           outlier <- boxplot.stats(var_name)$out          +           mo <- mean(outlier)          +           var_name <- ifelse(var_name %in% outlier, NA, var_name)          +  boxplot(var_name, ...

Get Hands-On Exploratory Data Analysis with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.