Chapter 10. Categorical Data

A categorical variable is one in which the responses consist of a set of categories rather than numbers that measure an amount or quantity of something on a continuous scale. For instance, a person may describe their gender in terms of “male” or “female” or a machine part may be classified as “acceptable” or “defective.” More than two categories are also possible: for instance, a person might describe their political affiliation (in the United States) as “Republican,” “Democrat,” “Independent,” or “Other.”

Categorical variables may be inherently categorical, with no numeric scale underlying their measurement (such as political party affiliation) or may be created by categorizing a continuous or discrete variable. For instance, blood pressure is a measure of the pressure exerted on the walls of the blood vessels, measured in millimeters of mercury (Hg). Blood pressure is usually recorded with specific measurements such as 120/80 Hg, but it is often analyzed using categories such as low, normal, prehypertensive, and hypertensive. An example using a discrete variable is number of children in a household: while the data may be collected as the exact number of children, it may be analyzed in categories such as “0 children”, “1–2 children,” and “3 or more children.”

Although the wisdom of classifying continuous or discrete measurements into categories is sometimes debatable (some researchers refer to it as “throwing away information” because it discards all ...

Get Statistics in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.