Performing outlier detection

Outlier detection is used to find outliers in the data that can throw off your analysis. Outliers come in two flavors: Univariate and Multivariate. A univariate outlier is a data point that consists of an extreme value on one variable. Univariate outliers can be seen when looking at a single variable. A multivariate outlier is a combination of unusual scores on at least two variables, and are found in multidimensional data.

For this recipe, we are going to use the college dataset from An Introduction to Statistical Learning with Applications in R.

How to do it…

  1. First, import the Python libraries that you need:
    import pandas as pd import numpy as np import matplotlib as plt import matplotlib.pyplot as plt %matplotlib inline ...

Get Python Business Intelligence Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.