O'Reilly logo

Statistics for Big Data For Dummies by David Semmelroth, Alan Anderson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 10

Sending Out a Posse: Searching for Outliers

In This Chapter

arrow Learning how to identify outliers with formal statistical procedures

arrow Seeing how outliers affect statistical tests

arrow Finding out how to avoid the problems associated with outliers

An outlier is a member of a dataset that’s significantly larger or smaller than the other values in the dataset. Outliers can appear in all walks of life. For example, the following would be considered outliers:

  • A man who is seven feet tall
  • A woman who is 100 years old
  • A household that has an annual income of $100 million per year
  • A baseball player who hits .400 during an entire season

In statistical analysis, an outlier refers to a value that is substantially different from the other values within a sample or a population. For example, suppose you take a sample of housing prices in a small town, with the following results (in hundreds of thousands of dollars):

240, 270, 290, 305, 332, 348, 371, 404, 2,250

In this case, you would consider the home that’s worth $2.25 million to be an outlier because it’s so much more expensive than the other homes in that town. In fact, it’s more than five times as costly as the next most expensive ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required