In This Chapter
Understanding what is an outlier
Distinguishing between extreme values and novelties
Using simple statistics for catching outliers
Finding out most tricky outliers by advanced techniques
Errors happen when you least expect, and that’s also true in regard to your data. In addition, data errors are difficult to spot, especially when your dataset contains many variables of different types and scale (a high-dimensionality data structure).
Data errors can take a number of forms. For example, the values may be systematically missing on certain variables, erroneous numbers could appear here and there, and the data could include outliers. A red flag has to be raised when the following characteristics are met: