Statistics is a useful tool in data mining and can be used to analyze or make
inferences about data in order to discover useful information and to draw
statistical conclusions about a dataset in a database. Figure 4.1 illustrates statis-
tical query processing. The population is a collection of objects about which
we try to discover new facts and information. Its member may be known or
unknown. If the population is unknown, then a sample dataset has to be used
to derive facts or trends about the population. On the other hand, if the popu-
lation is known, the entire popula tion may be stor ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month, and much more.