Luk ArbuckleKhaled El Emam

Anonymizing Health Data

Date: This event took place live on September 13 2013

Presented by: Luk Arbuckle, Khaled El Emam

Duration: Approximately 60 minutes.

Cost: Free

Questions? Please send email to


How can health data be released to analysts and app developers who desperately want it? Under current legislation, the use and disclosure of health data for secondary purposes is limited—patients must either consent to have their data used, which is often difficult to get and can lead to bias, or the data needs to be de-identified (there are some exceptions, but we won't address them in this webinar.)

To ensure that end users get data that is anonymized and highly useful, we focus on the HIPAA Privacy Rule De-identification Standard. We've built our risk-based methodology for anonymizing data around the foundation created by HIPAA's Statistical Method. In this webcast we'll share several of the case studies that we've described in our O'Reilly book Anonymizing Health Data, which is devoted to examples of how we anonymized real-world data sets. In almost every case in which we've anonymized data, there have been new and interesting challenges to overcome.

In this webcast we'll start with a discussion of the relatively simple de-identification of a cross-sectional disease registry, and then we'll jump in to more complex situations like the de-identification of longitudinal data, free-form text, and geospatial data. Given the limited time we have, we'll only be touching on the anonymization puzzles we've faced, and the approaches we've developed to solve them.

About Khaled El Emam

Dr. Khaled El Emam is the Founder and CEO of Privacy Analytics, Inc. He is also an Associate Professor at the University of Ottawa, Faculty of Medicine, a senior investigator at the Children's Hospital of Eastern Ontario Research Institute, and a Canada Research Chair in Electronic Health Information at the University of Ottawa. His main area of research is developing techniques for health data de-identification or anonymization and secure disease surveillance for public health purposes. He has made many contributions to the health privacy area. In addition, he has considerable experience de-identifying personal health information under the HIPAA Privacy Rule Statistical Standard.

Previously Khaled was a Senior Research Officer at the National Research Council of Canada, and prior to that he was head of the Quantitative Methods Group at the Fraunhofer Institute in Kaiserslautern, Germany. He has co-founded two companies to commercialize the results of his research work. In 2003 and 2004, he was ranked as the top systems and software engineering scholar worldwide by the Journal of Systems and Software based on his research on measurement and quality evaluation and improvement, and ranked second in 2002 and 2005. He holds a Ph.D. from the Department of Electrical and Electronic Engineering, King's College, at the University of London (UK). His website is

About Luk Arbuckle

Luk Arbuckle has been crunching numbers for a decade. He originally plied his trade in the area of image processing and analysis, and then in the area of applied statistics (use R!). Since joining the Electronic Health Information Laboratory (EHIL) at the CHEO Research Institute he has worked on methods to de-identify health data, participated in the development and evaluation of secure computation protocols, and provided all manner of statistical support. As a consultant with Privacy Analytics, he has also been heavily involved in conducting risk analyses on the re-identification of patients in health data.

O'Reilly Strata Rx Conference