Tiller.
Tiller. (source: Joseph on Flickr)

Last fall, O’Reilly published the second edition of my report, “Understanding the Chief Data Officer,” for which I spoke to nearly two dozen chief data officers (CDOs). I was looking for the patterns, the ubiquitous challenges that cut across a variety of industries, and I found them—you can download the free report to read about my findings. However, there was also something that jumped out at me by its absence: data ethics. It’s not that today’s CDOs aren’t worried about the ethics of working with data. On the contrary: I found those I interviewed to be highly conscientious and thoughtful people. But data ethics never came up as an overt issue that was being talked about in their organizations or specifically addressed among their top priorities.

I was immediately intrigued, therefore, to come across a talk on data ethics at the ODSC conference last November given by Abe Gong. Gong was serving as CDO at Aspire Health, a company that naturally deals with a lot of medical data, which is highly regulated under the protections of HIPAA.

I caught up with Gong by telephone a few weeks after his talk to ask him more about the current state of data ethics and the role of the CDO in that area. The following interview has been edited for length and clarity.

Julie Steele: The area of data ethics—is this something that you have been thinking about for a while, or only more recently in your recent role as a health care CDO?

Abe Gong: It's something I've been thinking about for quite a while. In some ways, it’s easier to tell the story of why medical data has important ethical implications, but I think it is true of pretty much all data.

For example, take the story around Facebook fake news. A few months ago, most people would have said that's a pretty low-stakes area. Sure, some users might see some weird stuff in their feeds, but if people want to post weird stuff online, so be it.

But after the election, a lot of people realized that this actually has real consequences. What we were thinking of as a purely technical problem actually has important human implications.

JS: Are these conversations different in health care?

AG: I’ve had these conversations internally everywhere I’ve worked. The emphasis differs by company.

Surprisingly, in health care, the focus is often on data quality. People have usually thought through potential conflicts of interest in the business model—and the medical community has a long history of taking ethics very seriously. But big data is relatively new in health care, and it would be easy to make mistakes with large-scale consequences. So, sufficiently cleaning data as it comes in from outside turns out to be ethically important.

JS: From an ethical perspective, what do you wish more people working with data would do before we get to these points where we see the impact?

AG: The first thing we have to do as a community is start talking about it. This is often happening, just not in a systemic way.

If you go back to Facebook, it’s clear that many people there were thinking about potential problems from fake news long before the election. It just hadn't yet bubbled up high enough in the hierarchy to make a difference.

JS: In your view, is that because there are misaligned incentives—companies are thinking about shareholders or profits more than ethics? Or are people just not anticipating the consequences of things we are doing with data?

AG: This is a nuanced thing that I am still trying to figure out. It’s usually too simple to say, “this is a greedy company.”

The thing about teams that are data driven is that they usually have one metric that they’re optimizing. That’s a very effective way to get everyone on the same page to experiment quickly. It lets you coordinate around the question of, “How do we improve our business model and make this thing we are all building together work?”

But if your metric overlooks important side consequences or externalities, then things can go wrong.

JS: I really appreciated the talk you gave at ODSC and the four questions you suggested people should be asking:

  1. Are the statistics solid?
  2. Who wins? Who loses?
  3. Are the changes in power structures helping?
  4. How can we mitigate harms?

Is this type of review something you’re already seeing some companies put into action?

AG: Last year, I did an extended ethics review with a group in San Francisco called The Data Guild. I shared these four questions, and then said, “Let's see if we can find ethical risks that we haven't thought about before.” Thirty people from the data community spent almost an hour talking about the data systems we’re building. The goal was to help each other see places where things can go wrong, and where potential unintended consequences are.

It was a really interesting experience. And at the end of the day, we did find some edge cases, some potential places where we would have to be careful about what we were doing.

But the nice thing about working in startups is that companies are still growing. If you’re looking for unintended consequences as you grow, you can map out many of the risks before you actually have to confront them. Then you can course correct as needed.

JS: And that's the idea, isn't it, to be able to see around corners and kind of identify those pitfalls well before they come up?

AG: In a perfect world, yes. But you can't always see problems in the abstract. Once you get up and moving, and start to have data and hands-on experiences in a new different market, then the risks will start to make themselves apparent. You learn in motion.

Tech companies in particular have to ask themselves: how do we build process in the early stages so we learn to catch things that could bubble up and become big problems later on? In other words, as you are scaling up, you need to make sure problems don’t scale with you.

JS: Who should be driving this process? Is it the role of the CDO? How does this emerge organically in a company that doesn't have it already in place?

AG: Answering that question depends on where data science lives in the organization. It will vary based on the needs of the company. But it’s clear that data people have to have a seat at the table because they will often see the problems first, and they will be the ones who understand the technical road map to correcting it in the simplest, most direct way. This is something that every CDO should have their eye on.

But business functions also have to be involved. At Facebook, for example, it would have been a big deal to say, “We are going to optimize for engagement everywhere except for cases x, y, and z.” That’s not something the data team could do unilaterally—product people would also have to be in on the decision.

JS: Do you think that people are becoming aware of the need to consider ethics, and things are generally moving in a good direction? Or do you feel that the pace at which algorithms are being adopted in areas of public life is outstripping the pace at which we are figuring out how to watch out for the pitfalls?

AG: I go back and forth, honestly. I see the conversation in the data community evolving in a healthy direction. But I'm not sure it is evolving fast enough. I'm also worried that many people who work with data professionally don't get out of their bubble a whole lot and talk to ordinary people, let alone policy makers, civil rights activists, or other people who see more of the downsides.

If those conversations come together in a good way in the next few years—I'm thinking between two and five years—then I believe we’ll land at a constructive societal outcome. If those groups end up being at odds, I’m actually pretty pessimistic. Sooner or later, we’ll end up at a forced decision between really stringent regulation or completely unrestrained corporate action.

JS: What next steps do you suggest? How can people get involved?

AG: One thing that every CDO can do right now is start conducting ethics reviews. That is, review their team’s work from an ethical perspective, not just a technical one. This can be as simple as a 30-minute discussion of those four questions before launching a new data product. It’s very simple, but great insurance against persistent ethical blind spots.

More broadly, we need to start having these kinds of conversations as a community: developing common language and shared norms. That’s something I’ve been working on pretty actively over the last year, along with others—Chris Diehl, Clare Corthell, and Jake Metcalf to name a few. We’re raising awareness within the community and building bridges to other stakeholders. We’re gearing up for a broader launch in the next couple months, so stay tuned!

If this is something your readers want to be involved with, I’d love to get in touch. I’m active on Twitter: @abegong.

This post is part of a collaboration between O’Reilly and Silicon Valley Data Science. See our statement of editorial independence.

Article image: Tiller. (source: Joseph on Flickr).