Chapter 7. Visualizing Distributions: Histograms and Density Plots

We frequently encounter the situation where we would like to understand how a particular variable is distributed in a dataset. To give a concrete example, we will consider the passengers of the Titanic, a dataset we encountered in Chapter 6. There were approximately 1,300 passengers on the Titanic (not counting crew), and we have reported ages for 756 of them. We might want to know how many passengers of what ages there were on the Titanic, i.e., how many children, young adults, middle-aged people, seniors, and so on. We call the relative proportions of different ages among the passengers the age distribution of the passengers.

Visualizing a Single Distribution

We can obtain a sense of the age distribution among the passengers by grouping all passengers into bins with comparable ages and then counting the number of passengers in each bin. This procedure results in a table such as Table 7-1.

Table 7-1. Numbers of passengers with known age on the Titanic.
Age range Count

0–5

36

6–10

19

11–15

18

16–20

99

21–25

139

26–30

121

Age range Count

31–35

76

36–40

74

41–45

54

46–50

50

51–55

26

56–60

22

Age range Count

61–65

16

66–70

3

71–75

3

We can visualize this table by drawing filled rectangles whose heights correspond to the counts and whose widths correspond to the width of the age bins (Figure 7-1). Such a visualization is called a histogram. (Note that all ...

Get Fundamentals of Data Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.