Chapter 5

Identifying and Deidentifying Data

Abstract

Data identification is one of the most under-appreciated and least understood issues in data science. Measurements, annotations, properties, and classes of information have no informational meaning unless they are attached to an identifier that distinguishes one data object from all other data objects and that links together all of the information that has been or will be associated with the identified data object. The method of identification and the selection of objects and classes to be identified relates fundamentally to the organizational model of complex data. If the simplifying step of data identification is ignored or implemented improperly, data cannot be shared, and conclusions ...

Get Data Simplification now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.