Chapter 16

Detecting Outliers in Data

IN THIS CHAPTER

Bullet Understanding what is an outlier

Bullet Distinguishing between extreme values and novelties

Bullet Using simple statistics for catching outliers

Bullet Finding out most tricky outliers by advanced techniques

Errors happen when you least expect, and that’s also true in regard to your data. In addition, data errors are difficult to spot, especially when your dataset contains many variables of different types and scale. Data errors can take a number of forms. For example, the values may be systematically missing on certain variables, erroneous numbers could appear here and there, and the data could include outliers.

In this chapter, you not only will learn what is an outlier and why it differs from a novelty value, but you will find techniques to detect and replace those examples that deviate from the data distribution you want to be represented by your machine learning models.

Remember You don’t have to type the source code for this chapter manually; using the ...

Get Python for Data Science For Dummies, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.