Note: This glossary is an extension to one compiled by Ron Kohavi and Foster Provost (1998), used with kind permission of Springer Science and Business Media.
A priori is a term borrowed from philosophy meaning “prior to experience.” In data science, an a priori belief is one that is brought to the problem as background knowledge, as opposed to a belief that is formed after examining data. For example, you might say, “There is no a priori reason to believe that this relationship is linear.” After examining data you might decide that two variables have a linear relationship (and so linear regression should work fairly well), but there was no reason to believe, from prior knowledge, that they should be so related. The opposite of a priori is a posteriori.
The rate of correct (incorrect) predictions made by the model over a dataset (cf. coverage). Accuracy is usually estimated using an independent (holdout) dataset that was not used at any time during the learning process. More complex accuracy estimation techniques, such as cross-validation and the bootstrap, are commonly used, especially with datasets containing a small number of instances.
Techniques that find conjunctive implication rules of the form “X and Y → A and B” (associations) that satisfy given criteria.
A quantity describing an instance. An attribute has a domain defined by the attribute type, which denotes the values that ...