# ChapterÂ 10.Â Categorical Data

A categorical variable is one in which the responses consist of a set of categories rather than numbers that measure an amount or quantity of something on a continuous scale. For instance, a person may describe their gender in terms of âmaleâ or âfemaleâ or a machine part may be classified as âacceptableâ or âdefective.â More than two categories are also possible: for instance, a person might describe their political affiliation (in the United States) as âRepublican,â âDemocrat,â âIndependent,â or âOther.â

Categorical variables may be inherently categorical, with no numeric scale underlying their measurement (such as political party affiliation) or may be created by categorizing a continuous or discrete variable. For instance, blood pressure is a measure of the pressure exerted on the walls of the blood vessels, measured in millimeters of mercury (Hg). Blood pressure is usually recorded with specific measurements such as 120/80 Hg, but it is often analyzed using categories such as low, normal, prehypertensive, and hypertensive. An example using a discrete variable is number of children in a household: while the data may be collected as the exact number of children, it may be analyzed in categories such as â0 childrenâ, â1â2 children,â and â3 or more children.â

Although the wisdom of classifying continuous or discrete measurements into categories is sometimes debatable (some researchers refer to it as âthrowing ...

Get Statistics in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.