3Exploring and Displaying the Data

Many data analysis mistakes can be avoided by first looking at summaries and visualizations of the data. After completing this chapter you should be able to:

  • Calculate measures of central location, such as mean and median
  • Judge which measure of central location is appropriate for a particular scenario
  • Calculate measures of variation, such as variance and percentiles
  • Measure distance between records
  • Produce a frequency table
  • Interpret a box plot and histogram
  • Describe what an outlier is

3.1 Exploratory Data Analysis

Some data analyses begin without a preconceived hypothesis. Space scientists want to examine samples brought back from the moon to see what elements are present. Marketers want to know about the characteristics of people who buy a given product. Business researchers want to know about the financial management structures of successful firms.

In other cases, a hypothesis is formed prior to the collection of data. Market researchers may want to test a theory that urban residents are more likely to purchase a certain product than rural residents. In the case study we have been looking at, hospital administrators want to test the proposition that no-fault reporting for errors will reduce the number of major medical errors.

In either case, it is good to conduct exploratory data analysis (EDA) to summarize and display the data to develop greater understanding. There is one important distinction:

  • If a hypothesis is developed out of ...

Get Statistics for Data Science and Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.