290 ◾ PRAGMATIC Security Metrics
Before we continue, there’s something we need to just mention. We are prac-
titioners and pragmatists by nature, not statisticians or academics. is book is
ﬁrmly grounded in the real world. If you have come to this chapter for advice on
chi-squared tests or Poisson distributions, you will be sadly disappointed. We pro-
fess only basic and limited knowledge of statistics and number theory—enough to
get by. For the more complicated stuﬀ, we rely on our old friend Mr. Google and,
on rare occasions, the dog-eared statistics textbooks from our college studies many
11.1 Gathering Raw Data
Metrics, of course, depend on measurements
and analysis of the raw data—the
numbers that underpin them. ere are many diﬀerent sources and ways to gather
raw data, and we don’t intend to go into great detail here, apart from mentioning a
few techniques that we have found useful in practice.
In situations where it is infeasible, inappropriate, or impossible to measure every
single member of a population,
measuring a sample instead will give statistically
valid data for the entire population provided the sample is scientiﬁcally selected.
e selection of samples should ideally be randomized and unbiased, meaning
(1) samples should be selected in such a manner that every member of the popula-
tion has an equal chance of being selected, and (2) the selection of new samples
should not depend in some way on samples previously selected. e sample also has
to be large enough to be considered statistically representative of the population
from which it is drawn.
Various sampling approaches are used routinely by auditors if they cannot rea-
sonably check the entire population of interest. For instance, when checking the
accuracy of the payroll, auditors may select a bunch of people at random from the
payroll ﬁle and conﬁrm that they were paid correctly. e number in “a bunch”
often depends on their initial ﬁndings: if no errors are detected in the initial sample,
Better yet, we ﬁnd someone who knows and ask him or her to do it for us.
Metrics also require one or more points of reference. Six inches is a measure but pretty mean-
ingless without reference points. Six inches from something or between two speciﬁc points
provides the context and relevance.
100% sampling is generally not cost-eﬀective unless the population is small, for example, seek-
ing the opinions of all executive managers.
e level of variability in the population is relevant: if the population is quite consistent (i.e.,
low deviation, all members tightly cluster around the mean value), fewer samples are needed.
However, the reason we are measuring is generally that we don’t know how variable the popu-
lation is, hence the rule of thumb that follows.