Chapter 7. Anomaly Detection on Nonnormalized Data
In Chapter 6, I showed you three ways to visualize outliers when your data is normally distributed. However, oftentimes you will come across data that isn’t normally distributed. Using methods that assume a normal distribution could lead to false conclusions or misguided decisions by you and your stakeholders. That is why the exploratory tactics covered in Chapter 4 are so important.
In this chapter, I will show you three methods you can implement to visualize outliers when you are working with nonnormalized data. The methods are mean absolute deviation, Tukey’s fences, and modified z-score test.
Understanding Median Absolute Deviation
The median absolute deviation (MAD) is a statistical measure that quantifies the dispersion or variability of a dataset. It is calculated by finding the absolute deviation of each data point by subtracting the median from each value and taking the absolute value of the result. Then you find the median of the absolute deviations, which gives you the MAD. The mathematical formula to calculate the MAD is as follows:
where
MAD = median absolute deviation
Xi = each value
Median = median value
The steps to find the MAD are very simple when you break this formula down. Consider this dataset as an example: 5, 10, 12, 15, 18. Here are the steps to find the MAD from this sample dataset:
-
Find the median. In this dataset you can see that the median value is ...
Get Statistical Tableau now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.