Chapter 20. Mosaic Plots

Graphing Categorical Data

Most of the graphs we have studied so far have been of quantitative variables. In a few cases, we have mixed quantitative and categorical variables, usually by making the distinct values of the categorical variable(s) define groups, each one having its own graph. Sometimes, however, all of the variables of interest are categorical. This requires special graphical methods.

Let’s consider a dataset in the epiDisplay package. You will need to install this package, as well as vcd, which includes some functions for working with categorical variables. Here’s how to do that:

> install.packages("epiDisplay")
> install.packages("vcd")
> library(epiDisplay)
> library(vcd)

We will be looking at the ANCdata dataset. You’ll need to get some information about this dataset:

> ?ANCdata

This data is from a study of the types of care given to women with high-risk pregnancies in two clinics. There are three variables, all categorical, and each has only two values, or levels. We would like to know if perinatal mortality (i.e., a stillborn fetus or death of newborn within seven days) is related to the type of treatment or the clinic in which care was received. Let’s first look at the relationship between death and anc (treatment). The table() command shown in the following script will count  the number of observations in each combination of the two variables:

 # Table 20-1 library(epiDisplay) library(vcd) attach(ANCdata) xtab1 = table(death,anc) ...

Get Graphing Data with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.