Chapter 6. Summarized Data Distributions
This chapter explores how to visualize summarized distributions of data.
Making a Basic Histogram
Problem
You want to make a histogram.
Solution
Use geom_histogram()
and
map a continuous variable to x
(Figure 6-1):
ggplot(
faithful,
aes(
x=
waiting))
+
geom_histogram()
Discussion
All geom_histogram()
requires is one column from a data frame or a single vector of data. For
this example we’ll use the faithful
data set, which contains data about the Old Faithful geyser in two
columns: eruptions
, which is the
length of each eruption, and waiting
,
which is the length of time to the next eruption. We’ll only use the
waiting
column in this
example:
faithful
eruptions waiting
3.600
79
1.800
54
3.333
74
...
If you just want to get a quick look at some data that isn’t in
a data frame, you can get the same result by passing in NULL
for the data frame and giving ggplot()
a vector of values.
This would have the same result as the previous code:
# Store the values in a simple vector
w<-
faithful$
waiting ggplot(
NULL
,
aes(
x=
w))
+
geom_histogram()
By default, the data is grouped into 30 bins. This may be too
fine or too coarse for your data. You can change the size of the bins by
using binwidth
, or you can divide the range of the data into a specific number of bins. The default colors—a dark fill without an outline—can make it difficult to ...
Get R Graphics Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.