Chapter 5. Categorical Data

A categorical variable is a variable in which the possible responses consist of a set of categories rather than numbers that measure an amount or quantity of something on a continuous scale. For instance, a person might describe his or her gender in terms of male or female, or a machine part might be classified as acceptable or defective. More than two categories are also possible. For instance, a person in the United States might describe his political affiliation as Republican, Democrat, or independent.

Categorical variables may be inherently categorical (such as political party affiliation), with no numeric scale underlying their measurement, or they may be created by categorizing a continuous or discrete variable. Blood pressure is a measure of the pressure exerted on the walls of the blood vessels, measured in millimeters of mercury (Hg). Blood pressure is usually measured continuously and recorded with specific measurements such as 120/80 mmHg, but it is often analyzed using categories such as low, normal, prehypertensive, and hypertensive. Discrete variables (those that can be taken only on specific values within a range) may also be grouped into categorical variables. A researcher might collect exact information on the number of children per household (0 children, 1 child, 2 children, 3 children, etc.) but choose to group this data into categories for the purpose of analysis, such as 0 children, 1–2 children, and 3 or more children. This type of ...

Get Statistics in a Nutshell, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.