O'Reilly logo

Illuminating Statistical Analysis Using Scenarios and Simulations by Jeffrey E. Kottemann

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

48Obstacles and Maneuvers

The formulaic methods for scaled data in Parts II, III, and IV assume that sample data itself is normally distributed, or fairly close to it. And while this assumption is often warranted, sometimes it is not. The statistical scenario—unruly data is a case in point. That data harbors two nonnormal traits that are common obstacles to using the statistical analysis methods we've seen so far for scaled data.

The first obstacle, as you know, are outliers. Figure 48.2a shows the impact of one extreme outlier in a sample of size 30 (the outlier value of 1000 shows up as a little nub in the “More” slot). Figure 48.2b shows the data with the outlier removed. Notice the large impact the outlier has on the sample mean and the even larger impact it has on the sample variance that in turn will wreak havoc on various other sample statistics, confidence intervals, and significance tests.

(a) A bar graphical representation for data sample with one extreme outlier (sample mean = 53, sample variance = 32,036), where frequency is plotted on the y-axis on a scale of 0–20 and data value bins on the x-axis on a scale of 10–more. (b) A bar graphical representation for data sample with one extreme outlier (sample mean = 21, sample variance = 48), where frequency is plotted on the y-axis on a scale of 0–20 and data value bins on the x-axis on a scale of 10–70. img

Figure 48.2

Statistical ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required