Anonymizing Health Data

Chapter 10. Medical Codes: A Hackathon

There are a few standard coding systems used in health data, for procedures, diseases, and drugs. We’ve mentioned them a few times already, but a chapter about codes sounds pretty boring. Well, we’re not about to list the codes and leave it at that. You can get books about these codes elsewhere (and they are oh, so interesting to read… to be fair, they’re references, not books to curl up with in an armchair). No, what we’ll look at are the specific ways to anonymize these codes in health data. By now you know the drill: generalization and suppression. But—spoiler alert—we have another trick up our sleeve to keep the original codes within generalized groups. This is a major aid in increasing the utility of data. But we’ll save that one for last, just to add to the anticipation.

We had the chance to apply all of these methods to a data set used for a hackathon known as the Cajun Code Fest^[93] (how awesome is that name?). A hackathon is a competition in which programmers are (figuratively) caged up in some common space to code day and night to accomplish a predefined goal (no programmers were harmed in the making of this hackathon). For the Cajun Code Fest, in Lafayette, Louisiana, registrants were given de-identified claims data for the state and told to come up with something that would improve health care.

Taking advantage of health data requires more than just programming skills, so the organizers of the Cajun Code Fest encouraged people who ...

Get Anonymizing Health Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Anonymizing Health Data by Khaled El Emam, Luk Arbuckle

Chapter 10. Medical Codes: A Hackathon

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly