Appendix A. Supplementary Definitions
- Cardinality
-
The cardinality of a data set is the number of instances (rows) in a data set , denoted .
- Metric
-
A metric is a function such that for any points , , and in a domain , the following properties hold:
-
and , if and only if (non-negativity)
-
(symmetry)
-
(triangle inequality)
Keep in mind that some of the metrics used in differential privacy don’t satisfy all of these axioms. For instance, in the setting of unbounded contributions where data sets have user IDs (see “Data Sets with Unbounded Contributions”), two different data sets may still have an identifier distance of zero.
-
- Multisets
-
Multisets are data sets that can have multiple instances of each record. Unlike vectors, multisets don’t care about row ordering.
- Symmetric difference
-
The symmetric difference between multisets and is the multiset of elements that are either in or , but not in their intersection. The symmetric difference between sets and is denoted . That is, the symmetric difference consists of those elements that do not have matches in the other data set.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access