Measuring the spread

After you have the centers of the distribution, the next interesting question is how spread the values are. Do you have a very homogeneous sample, where the vast majority of the values is close to the mean, or do you have a lot of data far away on both sides of the center?

The range is the simplest measure of the spread; it is just the difference between the maximal value and the minimal value of a variable. Writing this definition as a formula is shown here:

In R, you can simply use the min() and the max() function from the base installation to calculate the range.

min(Age)max(Age)range(Age)

The result is 70. Here is ...

Get Data Science with SQL Server Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.