We have seen that visual descriptions of data can be very useful in giving us an overall idea of many traits of a sample: frequencies, centrality, dispersion, shape of the distribution, association between variables or groups, and trends were some of the traits we were able to assess from visual descriptions, depending on the type of variable at hand. These type of summaries also allowed detection of data errors. However, visual descriptions do not tend to be enough to describe data. There are two main reasons for this:
- By choice of scale, bin size, and other attributes, figures are subjective.
- Numerical summaries usually cannot be determined precisely from charts.
Take Figure 4.1, which shows the line chart of monthly hotel registrations at Puerto Rico, an island destination in the Caribbean. There is a seasonal aspect to the hotel registrations. But, it is hard to tell from the chart if the hotel registrations have decreased or increased overall. Perhaps there was a drop around 2009 and an increase since then, but it is unclear.
The situation seen above is not an exception. Scatterplots may not clearly indicate if there is a linear association between variables, and histograms may not be enough to visualize common values of a measurement or the shape of the data. We often need more specific summaries of the data to make decisions. For example, suppose we have the SAT scores of 100 randomly sampled freshman business students. A histogram ...