4Descriptive Statistics

4.1 Introduction

We have seen that visual descriptions of data can be very useful in giving us an overall idea of many traits of a sample: frequencies, centrality, dispersion, shape of the distribution, association between variables or groups, and trends were some of the traits we were able to assess from visual descriptions, depending on the type of variable at hand. These type of summaries also allowed detection of data errors. However, visual descriptions do not tend to be enough to describe data. There are two main reasons for this:

By choice of scale, bin size, and other attributes, figures are subjective.
Numerical summaries usually cannot be determined precisely from charts.

Take Figure 4.1, which shows the line chart of monthly hotel registrations at Puerto Rico, an island destination in the Caribbean. There is a seasonal aspect to the hotel registrations. But, it is hard to tell from the chart if the hotel registrations have decreased or increased overall. Perhaps there was a drop around 2009 and an increase since then, but it is unclear.

The situation seen above is not an exception. Scatterplots may not clearly indicate if there is a linear association between variables, and histograms may not be enough to visualize common values of a measurement or the shape of the data. We often need more specific summaries of the data to make decisions. For example, suppose we have the SAT scores of 100 randomly sampled freshman business students. A histogram ...

Get Principles of Managerial Statistics and Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Principles of Managerial Statistics and Data Science by Roberto Rivera

4Descriptive Statistics

4.1 Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly